Bringing video and music to life with AI Sony’s deep generative models - STEF2022 Technology that inspires emotion - Activities - R&D

Bringing video and music to life with AISony’s deep generative models

<Overview>
Sony’s deep generative model for content creation and restoration

We are developing deep generative models, which can generate new content and restore data by AI. Deep generative models are the mechanism to automatically generate new media content such as images, audio, text, or dialogue. We are developing large-scale deep generative modeling technologies, and they achieved state-of-the-art performance with lower computational resources. We will also show a new media restoration technique for creators that utilizes this technology.

<Development manager>

Yuhta TakidaR&D Center

Sony Group Corporation
Chieh-Hsin Lai

R&D Center
Sony Group Corporation

<Main features>

Vector-Quantized Variational Auto Encoder (VQ-VAE), a method for learning efficient compression, has been widely used for various generation tasks, but it is often laborious to train the model stably and requires many trials to obtain a good compressor. We developed a new training scheme for VQ-VAE, called Stochastically Quantized VAE (SQ-VAE), which improves VQ-VAE training and enables to stably train the model in a single cycle.
The overview of our proposed SQ-VAE
Diffusion model is one of the popular generative models, which generates new data by gradually transforming noise into meaningful object. This model is related to the physical phenomena called diffusion process, and Sony has been developing a novel training strategy of diffusion models with a newly derived equation following physical laws., which improves the learned models for generation.
The proposed training strategy guided by a newly derived equation following physical laws : Score FPE
Many existing AI-based restoration techniques require pairs of original clean data and the corrupted observations. However, collecting such pairs of data is expensive and laborious. Sony has developed a novel method to restore media which relies on diffusion models trained only on clean data.
Contents restoration utilizing a deep generative model

<Future direction>

We expect AI, such as deep generative model, to become an integral part of the music, film, and gaming industries in the years to come. We at Sony R&D have the unique opportunity to work directly with world-leading entertainment groups within these industries, and we want to make the most of this possibility.

<View other exhibits>

Surgical simulator

Mapray Digital Twin Platform

Gran Turismo Sophy - a breakthrough in AI

Cooperation of autonomous entertainment robots (aibo and poiq)

Back to STEF 2022 special site

Technology