SONY

Wearable AI makes your music experience immersive

DSEE Extreme™

The most familiar wearable devices to people are "headphones". Incorporating Sony's unique sensing technology and AI into them brings changes not only to the relationship between people and music but also to their daily lives.

Hiroshi Ohba/Toru Chinen/Yuki Yamamoto

Hiroshi Ohba(left) Sony Corporation

Toru Chinen(center)・Yuki Yamamoto(right) Tokyo Laboratory 20, Sony Group Corporation R&D Center

Restoring downscaled audio into beautiful high-resolution audio

Yamamoto: "DSEE HX™" (Digital Sound Enhancement Engine) is Sony's proprietary high-quality sound technology that creates a high-resolution quality realism to CDs and compressed audio sources by restoring information in music data back to near original form.
Originally, this was done through prediction. However, vocal sounds and drum sounds for example, have very different characteristics, so the signals that should be predicted are also different. That is where "DSEE Extreme" has achieved an even higher sound quality transformation by using AI to distinguish different sounds and perform the optimal processing for each sound. To see just how much we can improve audio with AI, we first started working on development by referring to the deep learning of images. And then we completed the "utmost version" that reached a level of success with no further room for improvement, but it was a very grand scale version that ignored any physical limitations on the amount of calculations or memory required.

Chinen: We achieved an advanced AI that can be called the "utmost version with the highest sound quality", however its real-time processing required lots of computation power and was too heavy to run inside products. At that point, I asked Mr. Yamamoto to research "how to make everything compact without losing sound quality." Using high resolution audio contents of Sony Music Entertainment as training data, we were able to evaluate how close the sounds are restored, from compressed data such as MP3, to their original high-resolution sounds. While confirming the quality numerically, we also verified sounds through human's sense of hearing using our own ears. At the same time, we carefully eliminated any waste of neural networks.

Yamamoto: A big breakthrough came in the difference between the 2D signal of images and the 1D signal of audio. You could say we took deep learning created for images and adapted it for audio. And finally, in 2018, we were able to include this technology in the Walkman.

Combining wearables and AI for an experience tailored to your lifestyle

Ohba: Because headphones equipped with "DSEE HX" were very well received, I wanted to incorporate the "DSEE Extreme" installed on the Walkman into the next-generation model of headphones. Upon consulting with Mr. Chinen and Mr. Yamamoto, we managed to optimize the processing volume while still maintaining the sound quality of "DSEE Extreme" and successfully install the technology onto the Bluetooth chip of the headphones. By equipping the WH-1000XM4 headphones with such technology, you can enjoy sound from any audio source such as MP3s or YouTube with even more beautiful high-resolution. Also, another new feature of the WH-1000XM4 headphones that I would like to draw your attention to is the "Speak-to-Chat" function that automatically stops music whenever the wearer speaks allowing that person to hear ambient sounds. AI is also used for this function.

Yamamoto: Allowing the headphones discern whether the wearer is speaking, or someone nearby is speaking was difficult than ever imagined. We efficiently used the 5 microphones which were already built into the outside of the WH1000XM4 headphones. Actually, this function was being developed at the same time as the "DSEE Extreme™", and since both algorithms are very similar, developing them in parallel helped to enhance their mutual performance.

Ohba: Unlike wake words that smart speakers listen and wake up to, the "Speak-to-Chat" function which recognizes any spoken words is very user friendly with a naturally occurring User Interface (UI). In addition to "Speak-to-Chat", the headphones also support a function called "Adaptive Sound Control". By using an app designed for smartphones and acceleration sensors and GPS found in smartphones, this feature will recognize things such as user "behavior" (walking, staying, running, and transport) and "frequently visited locations" and automatically change the settings of the headphones. This time, by using a newly updated specially designed app, in addition to the "behavior recognition" already supported, AI embedded inside the app is able to learn your "frequently visited locations". I believe using AI both for audio and for UI is something unique to Sony. Moreover, the fact that it is installed in a wearable device that has small size batteries is ground-breaking.

Using AI for creativity and entertainment

Ohba: We are always considering how AI can achieve things a user wants to do as well as things the user has not even noticed without losing intrinsic value of the product. We also consider what is the new relationship between users and device, and how can we create a total "immersion" in entertainment in a seemingly natural way. Because headphones are devices that have the closest contact with people, it should be possible to sense specific things such as our living contexts, circumstances at the time, and even what we are feeling in our lives. I think this is something we should strive for because Sony is in a business that creates excitement.

Yamamoto: AI learns by using past data. Humans, on the other hand, can create new things without being confined to past data. In other words, humans have creativity, but this is something rather difficult for AI. Both having their own strengths and weaknesses, I believe a collaborative relationship will form naturally in a way which they mutually support each another. For example, allowing AI to perform tasks where the results remain the same for anyone, and let humans perform work that requires creativity. This is one way I hope that AI and humans can successfully coexist.

Chinen: We are often required to do high valued work over a limited period of time, but much of that time is spent doing routine work. The advancement and dissemination of AI will allow people to spend almost all of their time pursuing creative work. And AI will become a powerful tool for people who work in fields of entertainment. Using AI to enhance creativity and entertainment is one of our goals at Sony.

Page Top