AI & Machine Learning

Pursue compact and high-performance AI

Audio Signal Processing and Speech Recognition

We are developing technology which accurately recognizes users’ natural speech amongst background noise and reverberation. Our focus is on improving the performance of audio signal processing and speech recognition technologies in the real world. We use deep learning to optimally integrate audio signal processing and speech recognition. This enables advanced speech recognition in unfavorable conditions, such as when there is mechanical noise from robotics. These technical optimizations catered to devices and use cases will be thoroughly user-friendly.

Image of Audio Signal Processing and Speech Recognition from imput to output Image of Audio Signal Processing and Speech Recognition from imput to output

Spoken Language Understanding / Natural Language Processing

We are developing Spoken Language Understanding technology to understand user utterances. This technology converts speech recognition text strings into machine-understandable information (semantic representation). We have based our models on various linguistic phenomena such as disfluencies and abbreviations, in addition to a semantic database which links spoken language with the real world. For further understanding of natural language itself, we are developing Natural Language Processing technology that analyzes text. This process involves tokenizing, assigning parts of speech and semantic attributes, and parsing the structure. We are also developing Knowledge Information Processing technology which is applied for disambiguation of language.

Image of Spoken Language Understanding / Natural Language Processing from input to output Image of Spoken Language Understanding / Natural Language Processing from input to output

Deep Learning

Deep Learning is a machine-learning technology that allows users to create AI models that “recognize and predict” and “generate and transform” any data through the provision of training data. The R&D deep learning research team has been working on fundamental technologies including large-scale training, model compression, few-shot learning, generation modeling, and neural rendering. The outcomes of these research efforts are integrated into the Neural Network Console, a GUI development tool, and the Neural Network Libraries, which are open-source libraries, to actively contribute to the advancement of AI fields. At Sony, these technologies are used not only in electronics products featuring AI technology, but also in the entertainment business such as movies, music, and games, as well as in new types of business.

Overview of Deep Learning Overview of Deep Learning

Behavior Learning

Behavior Learning, including but not limited to deep reinforcement learning, is the technology that enables an autonomous system to learn optimal behavior through its own trial-and-error experience. We aim to develop these technologies for planning actions in environments that are too complex or varied for humans to deal with, and for online optimization control mechanisms which effectively adapt to environments which vary more than anyone could anticipate in advance. We aim to apply this technology in robotics, including both navigation and manipulation, and in gaming AI. Also, we are proactively working on joint research projects with overseas universities and laboratories that are utilizing cutting-edge technologies.

Image of the learning process for Behavior Learning Image of the learning process for Behavior Learning

Agent Platform

We are developing an Agent Platform that understands user utterances and text input, combines audio and visual representations, and responds with animated character representations. This Agent Platform uses multiple technologies including speech recognition, image recognition, spoken language understanding, interactive response generation, and visual expression. We are also developing development tools, SDKs, and cloud systems for developing agent applications. Applications are being used in the entertainment field, such as animation and movies; in the financial field, such as life plan support; and in the B2B field, such as store and office reception and guidance.

Schematic diagram of Agent Platform Schematic diagram of Agent Platform

Data Analytics

Each business unit within the Sony Group is striving to create new customer values and business through the use of data. The Sony Group is involved in many different businesses, including entertainment, electronics, network services, and finance, and therefore generates huge amounts of widely varying data on a daily basis. We are involved in research to develop new machine-learning technologies and analysis platforms that allow us to make effective and easy use of this data. For example, we are currently developing Prediction One, which allows users to easily perform predictive analysis without being experts in either machine learning or statistics, as well as causal inference technology for personalization in direct to consumer (DTC) services. Using these core technologies and analysis platforms, we will be able to bring these advanced technologies to market.

Schematic diagram of Data Analytics Schematic diagram of Data Analytics


AI functions based on deep learning have been widely utilized in many types of products and services. However, the training time required for deep learning increases every year because realizing more-advanced AI functions requires “more and more training datasets” and “ever-larger models for training.” Therefore, we are developing the technologies required for the Sony-dedicated supercomputer (GAIA) which we are currently constructing. In particular, we are focusing on three technologies, namely, “AI-Optimized Architecture” based on the latest hardware, “Resource Management” which aims to extract the most out of the available hardware resources, and “Large-Scale Deep Learning” to accelerate the training time dramatically through the use of multiple processors such as GPUs. Our aim is the ultra-acceleration of AI development and the creation of new values through the use of more advanced AI, while also integrating the “sData” training data sharing system.

Schematic diagram of GAIA Schematic diagram of GAIA

Explainable AI

Explainable AI is a technology that visualizes the bases of judgment so that humans can understand machine learning, presenting them as what is termed a “black box.” The goal of these efforts is to create AI that can be trusted by humans. Machine learning has seen remarkable progress in recent years, with innovation apparent in every field. On the other hand, there are ethical issues in that there are concerns about AI hurting humans. Therefore, we are undertaking research and development addressing explainable AI to solve fairness, accountability, and transparency in machine learning through technology. Through the use of explainable AI, we are aiming to develop a variation of AI that allows humans to understand and control why that AI makes particular decisions.

Process image of building Explainable AI Process image of building Explainable AI
to the top