The Thirty-Sixth AAAI Conference on Artificial Intelligence

February 22 ~ March 01, 2022
(AAAI-2022 is a Virtual-only Conference)

The purpose of the AAAI conference is to promote research in artificial intelligence (AI) and scientific exchange among AI researchers, practitioners, scientists, and engineers in affiliated disciplines. AAAI-22 will have a diverse technical track, student abstracts, poster sessions, invited speakers, tutorials, workshops, and exhibit and competition programs, all selected according to the highest reviewing standards.

Recruit information for AAAI-2022

We look forward to highly motivated individuals applying to Sony so that we can work together to fill the world with emotion and pioneer the future with dreams and curiosity. Join us and be part of a diverse, innovative, creative, and original team to inspire the world.

For Sony AI positions, please see

*The special job offer for AAAI-2022 has closed. Thank you for many applications.


Technology 01

Outracing champion Gran Turismo drivers with deep reinforcement learning

A superhuman racing AI agent

We are thrilled to announce Sony AI's very first AI breakthrough - Gran Turismo Sophy which is featured on the cover of Nature magazine's Feb 10th issue.

Sony AI, together with Polyphony Digital Inc. (PDI) and Sony Interactive Entertainment (SIE), announced a breakthrough in artificial intelligence (AI) called Gran Turismo Sophy™ ("GT Sophy") - the first superhuman AI agent to outrace the world’s best drivers of the highly realistic PlayStation®4 racing simulation game, Gran Turismo™ (GT) Sport. GT Sophy aims to deliver new AI-powered gaming experiences to players around the world.
GT Sophy was trained to master the following driving skills needed to compete with the world’s best championship-level drivers.

Link to movie: The Making of Gran Turismo Sophy [English]
  1. 1. Race Car Control: Deep understanding of car dynamics, racing lines, and precision maneuvers to conquer challenging tracks.
  2. 2. Racing Tactics: Split-second decision-making skills in response to rapidly evolving racing situations. GT Sophy showed mastery of tactics including slipstream passing, crossover passes and even some defensive maneuvers such as blocking.
  3. 3. Racing Etiquette: Essential for fair play, GT Sophy had to conform to highly refined, but imprecisely specified, sportsmanship rules including avoiding at-fault collisions and respecting opponent driving lines.

Sony AI and its partners built a novel deep reinforcement learning approach and platform since no existing combination of algorithms and infrastructure could solve this challenge. Mastering the complex sport of race car driving in the highly realistic driving simulator Gran Turismo Sport represents a new breakthrough in AI and as such it is published today in Nature in an article titled Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning.

Details on the Gran Turismo Sophy project, technology and the Race Together 2021 challenge races can be found at the Gran Turismo Sophy site.

Technology 02

Using super-resolution technology in ray tracing

Balance high resolution and production efficiency
Link to movie: Using super-resolution technology in ray tracing

Utilizing know-how cultivated through the development of images using machine learning since the 1990s, this technology maximizes performance with limited computing resources and achieves high-resolution with high-precision from various perspectives for images of a variety of scenes and quality. For 3D content with a large amount of data, it is possible to reduce the production time by a factor of a few hundred by reducing the number of rays used and rendering the images based on information such as character shape, texture and lighting. The development is being carried out in cooperation with Sony Pictures Entertainment, reflecting the voices of creators, with the aim of expanding the range of applications from 2D to 3D and developing a wide range of applications in the entertainment field.

  • Takafumi Morifuji

    R&D Center, Sony Group Corporation

    He leads the research and development of machine learning based visual technologies for a wide range of fields from electronic products to games, movies, and other areas of entertainment at the Sony Group Corporation.
    He received master's degree in computer science from Osaka University in 1999, after which he joined the Sony and has been engaged in various research and development projects include image signal processing algorithms and systems architecture designs for 20+ years.

Business use case

Case 01

Sony's character conversation AI technology (束縛彼氏)

Link to Site: Sony's character conversation AI technology


Publication 01

DADFNet: Dual Attention and Dual Frequency-Guided Dehazing Network for Video-Empowered Intelligent Transportation

Lingjuan Lyu
Sony AI

Yu Guo, Wen Liu, Jiangtian Nie, Lingjuan Lyu, Zehui Xiong, Jiawen Kang, Han Yu and Dusit Niyato

Visual surveillance technology is an indispensable functional component of advanced traffic management systems. It has been applied to perform traffic supervision tasks, such as object detection, tracking and recognition. However, adverse weather conditions, e.g., fog, haze and mist, pose severe challenges for video-based transportation surveillance. To eliminate the influences of adverse weather conditions, we propose a dual attention and dual frequency-guided dehazing network (termed DADFNet) for real-time visibility enhancement. It consists of a dual attention module (DAM) and a high-low frequency-guided sub-net (HLFN) to jointly consider the attention and frequency mapping to guide haze-free scene reconstruction. Extensive experiments on both synthetic and real-world images demonstrate the superiority of DADFNet over state-of-the-art methods in terms of visibility enhancement and improvement in detection accuracy. Furthermore, DADFNet only takes $6.3$ ms to process a 1,920*1,080 image on the 2080 Ti GPU, making it highly efficient for deployment in intelligent transportation systems.

Publication 02

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

Lingjuan Lyu
Sony AI

Xuanli He, Qiongkai Xu, Lingjuan Lyu (corresponding author), Fangzhao Wu, Chenguang Wang

Nowadays, due to the breakthrough in natural language generation (NLG), including machine translation, document summarization, image captioning, etc NLG models have been encapsulated in cloud APIs to serve over half a billion people worldwide and process over one hundred billion word generations per day. Thus, NLG APIs have already become essential profitable services in many commercial companies. Due to the substantial financial and intellectual investments, service providers adopt a pay-as-you-use policy to promote sustainable market growth. However, recent works have shown that cloud platforms suffer from financial losses imposed by model extraction attacks, which aim to imitate the functionality and utility of the victim services, thus violating the intellectual property (IP) of cloud APIs. This work targets at protecting IP of NLG APIs by identifying the attackers who have utilized watermarked responses from the victim NLG APIs. However, most existing watermarking techniques are not directly amenable for IP protection of NLG APIs. To bridge this gap, we first present a novel watermarking method for text generation APIs by conducting lexical modification to the original outputs. Compared with the competitive baselines, our watermark approach achieves better identifiable performance in terms of p-value, with fewer semantic losses. In addition, our watermarks are more understandable and intuitive to humans than the baselines. Finally, the empirical studies show our approach is also applicable to queries from different domains, and is effective on the attacker trained on a mixture of the corpus which includes less than 10¥% watermarked samples.

Publication 03

fGOT: Graph Distances based on Filters and Optimal Transport

Mireille El Gheche
Sony AI

Hermina Petric Maretic, Mireille El Gheche, Giovanni Chierchia, Pascal Frossard

Graph comparison deals with identifying similarities and dissimilarities between graphs. A major obstacle is the unknown alignment of graphs, as well as the lack of accurate and inexpensive comparison metrics. In this work we introduce the ¥textit{filter graph distance}. It is an optimal transport based distance which drives graph comparison through the probability distribution of filtered graph signals.
This creates a highly flexible distance, capable of prioritising different spectral information in observed graphs, offering a wide range of choices for a comparison metric. We tackle the problem of graph alignment by computing graph permutations that minimise our new filter distances, which implicitly solves the graph comparison problem.
We then propose a new approximate cost function that circumvents many computational difficulties inherent to graph comparison and permits the exploitation of fast algorithms such as mirror gradient descent, without grossly sacrificing the performance. We finally propose a novel algorithm derived from a stochastic version of mirror gradient descent, which accommodates the non-convexity of the alignment problem, offering a good trade-off between performance accuracy and speed. The experiments on graph alignment and classification show that the flexibility gained through filter graph distances can have a significant impact on performance, while the difference in speed offered by the approximation cost makes the framework applicable in practical settings.

Publication 04

Byzantine-resilient Federated Learning via Gradient Memorization

Lingjuan Lyu
Sony AI

Chen Chen, Lingjuan Lyu, Yuchen Liu, Fangzhao Wu, Chaochao Chen and Gang Chen

Federated learning (FL) provides a privacy-aware learning framework by enabling a multitude of participants to jointly construct models without collecting their private training data. However, federated learning has exhibited vulnerabilities to Byzantine attacks. Many existing methods defend against such Byzantine attacks by monitoring the gradients of clients in the current round, i.e., gradients in one round. Recent works have demonstrated that such naïve methods can hardly achieve satisfying performance. Defenses based on one-round gradients could be compromised by adding a small well-crafted bias to the benign gradients, due to the high variance of one-round (benign) gradients. To address this problem, we propose a new Average of Gradients (AG) framework, which detects Byzantine attacks with the average of multi-round gradients (i.e., gradients across multiple rounds). We theoretically show that our AG framework leads to lower variance of the benign gradients, and thus can reduce the effects of Byzantine attacks. Experiments on various real-world datasets verify the efficacy of our AG framework.

Publication 05

GEAR: A Margin-based Federated Adversarial Training Approach

Lingjuan Lyu
Sony AI

Chen Chen, Jie Zhang and Lingjuan Lyu

Previous studies have shown that federated learning (FL) is vulnerable to well-crafted adversarial examples. Some recent efforts tried to combine adversarial training with FL, i.e., federated adversarial training (FAT), in order to achieve adversarial robustness in FL. However, most of the existing FAT works suffer from either low natural accuracy or low robust accuracy. Moreover, none of these works provide a more in-depth understanding of the challenges behind adversarial robustness in FL. To address these issues, we propose a novel marGin-based fEderated Adversarial tRaining Approach called GEAR. It encourages the minority classes to have larger margins by introducing a margin-based cross-entropy loss, and regularizes the decision boundary to be smooth by introducing a regularization loss, thus providing a better decision boundary for the global model. To the best of our knowledge, this work is the first to investigate the impact of decision boundary on FAT and delivers the best natural accuracy and robust accuracy in FL by far. Extensive experiments on multiple datasets across various settings all validate the effectiveness of our proposed method. For example, on SVHN dataset, GEAR can improve the natural accuracy and robust accuracy (against FGSM attack) of the best baseline method (FedTRADES) by 20.17¥% and 10.73¥%, respectively.

Page Top