New Excitement and Fun Ways to Enjoy Video and Audio Content

AI Sound Separation x Entertainment

AI that can separate specific sounds from a source with a mix of sounds has been adopted by devices and apps to provide great convenience to users. Furthermore, we are expanding possibilities in the field of entertainment such as remastering classic movie soundtracks and creating a new karaoke distribution service.

Yuki Mitsufuji/Tatsuya Haraguchi/Yoshikazu Takashima

Yuki Mitsufuji(left) Distinguished Engineer and Deputy General Manager,
Tokyo Laboratory 21, R&D Center, Sony Group Corporation

Tatsuya Haraguchi(center) Executive Vice President,
EdgeTech Project Group,Sony Music Entertainment (Japan) Inc.

Yoshikazu Takashima(right) Vice President Advanced Technology
Sony Pictures EntertainmentEntertainment Innovation & Technology Group
Technology Development

Achieving low complexity, low latency, with high real-time performance.

Mitsufuji: Normally, you would think that "AI sound separation" requires a lot of computational power, but we have been focusing on how to do it with a low amount of computations in a compact size. Of course, this is not so easy because "sound" is very intricate. In addition to low computations, our separation technology features low latency and high real-time performance. One of the results of this work is the "Intelligent Wind Filter" on the Xperia 1 II smartphone. Conventionally, a wind-proof microphone was necessary whenever trying to film a video outdoors to reduce the wind noise. However, the wind noise elimination function built into the smartphone makes it possible to record videos outdoors by only using the Xperia 1 II itself. In addition, the AI sound separation technology that separates human voices from ambient noise has been included in the smartphone intercom app "Callsign" mainly used for Business to Business (B2B) communications.

Classic movies brought back to life with Immersive audio.

Takashima: I first came into contact with AI sound separation in the summer of 2018. Just as I had come to an impasse working on the problem of converting soundtracks of classic movies into immersive sound, and my first reaction was "this is the answer!" In the following year, the use of AI sound separation began with the remastering of world-famous classic movies including "Ghandhi" and "Lawrence of Arabia" as the first commercial project, which were released overseas in June of 2020 as a 4K Ultra HD collection. You can now enjoy immersive sound effects with these titles. Customers who want 4K movies also have great expectations for immersive sound remixes which has received high ratings. It could be said that AI has undoubtedly enhanced the value of older films once again.

The 4K UHD versions of Lawrence of Arabia and Gandhi recorded in the Columbia Classics Collection Vol 1.

Mitsufuji: Until now, we have mostly been working on AI sound separation for music. Movies on the other hand, contain various sounds and sound effects, so collecting data, learning an AI system from it and applying it was a completely new challenge. But thanks to this effort, we have broadened the range of applications for AI sound separation technology. "Lawrence of Arabia" is a 1962 movie. We have never dealt with such an old sound source before. There are many other films considered to be a type of cultural heritage that cannot be seen in modern formats. In this project, we noticed the new possibility for audio source separation to increase the value of past film assets and were able to make it happen.

Takashima: Until now, sound separation had to be done by hand . At first, people including sound creators and engineers involved in the project were always doubtful and asked, "Can AI do such a thing?" But now, on the contrary, whenever they have a problem, they come over and ask, "Can't we make use of AI?" For example, it is a difficult process to eliminate the unwanted noise from things such as planes, cars, and echoes from walls that get mixed in while filming a movie scene on location, but we have begun working hard to have more efficient processing of these sounds using AI. AI could also be used during the creation of a dubbed version of new titles. Today, we manually extract voices of people from the original version, and then overdub with a different language.. It would be helpful if AI could assist us with that work. It would be wonderful if creators, through the support of AI, were able to devote more time and attention to their creative side of work.

A new karaoke experience on your smartphone.

Haraguchi: LINE MUSIC introduced a "karaoke function" starting in August of 2020. It provides a karaoke kind of experience by first removing the vocals from a song for music play and then mixing the user's voice in with the sound source for playback.
In regular karaoke, dedicated background music sources are prepared, but with this service, you can sing along with the same original performance found on CDs. When LINE MUSIC staff members were searching for technology that could be used for new functions, they found that Sony had an AI sound separation technology, which ultimately led to the realization of this service.
*LINE MUSIC Corporation is an investment company of LINE Corporation and three domestic music companies including Sony Music Entertainment (Japan) Inc. The subscription on-demand music streaming service "LINE MUSIC" is currently in service.

Mitsufuji: Being able to run the service in real time on smartphones was a technically large hurdle to overcome. It has to work not only with the Xperia™, but also with smartphones from other companies. Furthermore, we had to make it work on smartphones that have come into widespread use over the past few years, so it took a great deal of hard work examining all the specifications.

Haraguchi: We used a method which allows users to remove the vocals by merely operating the app instead of having to generate and deliver songs without vocals on the server side. Through this method, in theory makes it possible to enjoy karaoke with all the songs being distributed, resulting with an astounding number of songs compared to traditional karaoke. The same function has already begun to be offered by LINE MUSIC in Taiwan, and will continue to spread around the world from now on.

Mitsufuji: I think it has also played a role in proposing a new type of entertainment different from traditional karaoke. And more than anything, I am looking forward to seeing the reaction of users who now could be able to sing songs that you "wanted to sing, but had no karaoke version".

Data is available for learning, and it can be used for performance advancement.
Sony Group Synergy unique to Sony Group speeds up the AI development.

Mitsufuji: Having an abundance of sound sources needed to develop AI for high-performance audio source separation within the Sony Group is very instrumental. Since sound source files are not archived in the form of data set for AI, a lot of technical effort is required to make them useful for AI training. We are working together with a team at music production platform "Soundmain", who has been utilizing blockchain and AI technology for music production and rights management. The reason why Sony Group's synergies are making great contributions to the development of AI is because there are people within each Sony Group company like Mr. Takashima and Mr. Haraguchi that are proactively working on the utilization of technology. These people have become vital bridges for R&D staff members. I would like to continue further working with people from various companies and create new kinds of value.

Page Top