Search button in the site



35th AAAI Conference on Artificial Intelligence

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21) will be held virtually February 02~09, 2021.
The purpose of the AAAI conference is to promote research in artificial intelligence (AI) and scientific exchange among AI researchers, practitioners, scientists, and engineers in affiliated disciplines.
AAAI-21 will have a diverse technical track, student abstracts, poster sessions, invited speakers, tutorials, workshops, and exhibit and competition programs, all selected according to the highest reviewing standards.

February 02 ~ 09, 2021
(AAAI-21 is a Virtual-only Conference)

As many of you already know, AAAI-21 is going virtual. Sony intended to hold sessions and host a technology exhibition in the sponsors booth as one of the sponsors of the event. We intend to introduce several of Sony's latest combined AI, Sensor and Computer Vision technologies on this site, including some which are still in the development stage. While it may not be the same as meeting in person, we are keen to contribute to the virtual conference in any way that we can.

Sony at AAAI-21

Including papers by Sony members

Time zone is based on PST (Pacific Standard Time). If you are CET (Central European Time) / JST (Japan Standard Time), please consider to convert to +9 / +8 hours. The AAAI-21 Virtual Program for each main conference day (February 4-7) will have three (3) segments, consisting of a 75-minute live plenary session and a 1.75-hour poster session:
07:30 - 10:30 AM (PST)
15:30 - 18:30 PM (PST)
23:30 PM - 26:30 AM (PST)

- February 4th, 2021 (PST)

Work at Sony (1)

Time: February 4th, 18:30 - 19:00 PM (PST)

Work at Sony (2)

Time: February 4th, 26:30 - 27:00 AM (PST)

- February 5th, 2021 (PST)

Demonstration Program (1) / (2)

Time: February 5th, 08:45 - 10:30 AM / 16:45 - 18:30 PM / 24:45 - 26:30 AM
DEMO-319: "Demonstration of the EMPATHIC Framework for Task Learning from Implicit Human Feedback"
Peter Stone, The University of Texas at Austin and Sony AI, USA

Work at Sony (3)

Time: February 5th, 18:30 - 19:00 PM (PST)

- February 6th, 2021 (PST)

Demonstration Program (3)

Time: February 6th, 24:45-26:30 AM (PST)
DEMO-319: "Demonstration of the EMPATHIC Framework for Task Learning from Implicit Human Feedback"
Peter Stone, The University of Texas at Austin and Sony AI, USA

- February 8th, 2021 (PST)

W4:Artificial Intelligence Safety (SafeAI 2021) (Poster Session)

Time: February 8th, 06:20-06:30 AM (PST)
"Adversarial Attacks for Tabular Data: Application to Fraud Detection and Imbalanced Data"
Francesco Cartella, Orlando Anunciação, Yuki Funabiki, Daisuke Yamaguchi, Toru Akishita, Olivier Elshocht

W11: Explainable Agency in Artificial Intelligence (Workshop) (1)

Time: February 8th
"Effects of Uncertainty on the Quality of Feature Importance Estimates"
Torgyn Shaikhina, Umang Bhatt, Roxanne Zhang, Konstantinos Georgatzis, Alice Xiang and Adrian Weller

- February 9th, 2021 (PST)

W20: Plan, Activity, and Intent Recognition (PAIR) 2021 (Invited Talk)

Time: February 9th, noon-12:45 PM (PST)
"Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination"
Peter Stone, The University of Texas at Austin and Sony AI, USA

W11: Explainable Agency in Artificial Intelligence (Workshop) (2)

Time: February 9th
"Effects of Uncertainty on the Quality of Feature Importance Estimates"
Torgyn Shaikhina, Umang Bhatt, Roxanne Zhang, Konstantinos Georgatzis, Alice Xiang and Adrian Weller


Technology 1
Reinforcement Learning for Optimization of COVID-19 Mitigation Policies

The year 2020 has seen the COVID-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world are faced with the challenge of protecting public health, while keeping the economy running to the greatest extent possible. Epidemiological models provide insight into the spread of these types of diseases and predict the effects of possible intervention policies. However, to date, the most data-driven intervention policies rely on heuristics. In this research, we study how reinforcement learning (RL) can be used to optimize mitigation policies that minimize the economic impact without overwhelming the hospital capacity. Our main contributions are (1) a novel agent-based pandemic simulator which, unlike traditional models, is able to model fine-grained interactions among people at specific locations in a community; and (2) an RL-based methodology for optimizing fine-grained mitigation policies within this simulator. Our results validate both the overall simulator behavior and the learned policies under realistic conditions.

Joint research by Varun Kompella, Roberto Capobianco, Stacy Jong, Jonathan Browne, Spencer Fox, Lauren Meyers, Peter Wurman, and Peter Stone.

  • Peter Stone

    Sony AI America

    Professor Peter Stone is the founder and director of the Learning Agents Research Group (LARG) within the Artificial Intelligence Laboratory in the Department of Computer Science at The University of Texas at Austin, as well as associate department chair and Director of Texas Robotics. He was a co-founder of Cogitai, Inc. and is now Executive Director of Sony AI America. His main research interest in AI is understanding how they can best create complete intelligent agents. He consider adaptation, interaction, and embodiment to be essential capabilities of such agents. Thus, his research focuses mainly on machine learning, multiagent systems, and robotics. To him, the most exciting research topics are those inspired by challenging real-world problems. He believe that complete successful research includes both precise, novel algorithms and fully implemented and rigorously evaluated applications. His application domains have included robot soccer, autonomous bidding agents, autonomous vehicles, and human-interactive agents.

Technology 2
Hypotheses Generation for Applications in Biomedicine and Gastronomy

Hypothesis generation is the problem of discovering meaningful, implicit connections in a particular domain. We focus on two application areas 1) biomedicine and the discovery of new connections between scientific terms such as diseases, chemicals, drugs, genes, 2) food pairing for discovering new connections between ingredients, taste and flavor molecules. Sony AI and its academic partners have developed a variety of models that explore representation learning and novel link prediction models for these tasks. In the biomedical domain, we developed models able to leverage temporal data about how connections between concepts have emerged over the last 80 years. In the food domain, we deal with multi-partite graphs that link ingredients with molecule information and health aspects of ingredients. The talk will introduce hypothesis generation as a graph embedding representation learning and link prediction task. We'll present recently published models that integrate 1) variational inference for estimating priors, 2) graph embedding learning regimes and 3) application of embeddings in training ranking models.

  • Michael Spranger

    Sony AI Tokyo, Japan

    Dr. Michael Spranger is the COO of Sony AI Inc., Sony's strategic research and development organization established April 2020. Sony AI's mission is to "unleash human imagination and creativity with AI." Michael is a roboticist by training with extensive research experience in fields such as Natural Language Processing, robotics, and foundations of Artificial Intelligence. Michael has published more than 60 papers at top AI conferences such as IJCAI, NeurIPS and others. Concurrent to Sony AI, Michael also holds a researcher position at Sony Computer Science Laboratories, Inc., and is actively contributing to Sony's overall AI Ethics strategy.

Technology 3
Sensing, AI, and Robotics at Sony AI

In this short video, I will introduce our approach to combining sensing, AI, and robotics at Sony AI.

  • Peter Duerr

    Sony AI Zürich, Switzerland

    Dr. Peter Duerr is the Director of Sony AI in Zurich. After joining Sony in 2011 he worked on computer vision, AI and robotics research in various assignments at Sony R&D Center and Aerosense in Tokyo, at Sony R&D Center Europe, and recently Sony AI in Zürich. Peter holds an MSc in mechanical engineering from ETH Zürich and a PhD in computer and communication science from EPFL in Lausanne.

Technology 4
Content Restoration/Protection/Generation Technologies

The first work is titled "D3Net: Densely connected multidilated DenseNet for music source separation". Dense prediction tasks, such as semantic segmentation and audio source separation, involve a high-resolution input and prediction. In order to efficiently model local and global structure of data, we propose a dense multiresolution learning by combining a dense skip connection topology and a novel multidilated convolution. The proposed D3Net achieves state-of-the-art performance on a music source separation task, where the goal is separate individual instrumental sounds from a music. Demo is available in another video, so please check it out!

In the second paper, we investigate the adversarial attack on audio source separation problem. We found that it is possible to severely degrade the separation performance by adding imperceptible noise to the input mixture under a white box condition, while under a black box condition, source separation methods exabit certain level of robustness. We believe that this work is important for understanding the robustness of source separation models, as well as for developing content protection methods against the abuse of separated signals.

The last paper is on an investigation of posterior collapse in Gaussian VAE. VAE is one of the famous generative models for its tractability and stability of training, but it suffers from the problem of posterior collapse. We've investigated the cause of posterior collapse in Gaussian VAE from the viewpoint of local Lipschitz smoothness. Then, we proposed a modified ELBO-based objective function which adapts hyper-parameter in ELBO automatically. Our new objective function enables to prevent the over-smoothing of the decoder, or posterior collapse.

  • Naoya Takahashi

    Sony R&D Center Tokyo, Japan

    He received Ph.D. from University of Tsukuba, Japan, in 2020. From 2015 to 2016, he had worked at the Computer Vision Lab and Speech Processing Group at ETH Zurich as a visiting researcher. Since he joined Sony Corporation in 2008, he has performed research in audio, computer vision and machine learning domains. In 2018, he won the Sony Outstanding Engineer Award, which is the highest form of individual recognition for Sony Group engineers. His current research interests include audio source separation, semantic segmentation, video highlight detection, event detection, speech recognition and music technologies.

Technology 5
Video Colorization for Content Revitalization

Sony is highly interested in revitalizing old contents, and video colorization is one of such efforts. However, video colorization is a challenging task with temporal coherence and user controllability issues. We introduce our unique reference-based video colorization and demonstrate that it can alleviate the issues described above, helping revitalize the old black-and-white videos.

  • Andrew Shin

    Sony R&D Center Tokyo, Japan

    He received Ph.D from The University of Tokyo in 2017, after which he joined Sony. He has been working on development of Sony's deep learning framework Neural Network Libraries, as well as developing machine learning tools to support contents creation for entertainment business, and conducting core research.

  • Naofumi Akimoto

    Sony R&D Center Tokyo, Japan

    He received master degree of engineering from Keio University in 2020, after which he joined Sony. He has been working on research for machine learning technologies for image and video enhancement for Sony's entertainment business.

Technology 6
Neural Architecture Search for Automating The Design of Deep Neural Networks

The application of Deep Neural Networks (DNNs), typically involves a large amount of network engineering. In practice, this is a laborious and very time consuming task that must be performed with great care in order to get the maximum performance and efficiency. In particular, it requires much expertise, experience and intuition. At Sony, we are very interested in hardware aware Neural Architecture Search (NAS) algorithms to automate the architecture design process. Hardware aware means that we do not only optimize the DNN performance, but also enforce given hardware constraints like memory, power or latency constraints. In this presentation, we present NNablaNAS - Sony’s new open source Neural Architecture Search (NAS) toolbox for hardware aware NAS.

  • Lukas Mauch

    Sony R&D Center Europe, Germany

    He is with the Sony R&D center Europe (Germany) since 2019, working on efficient inference methods for Deep Neural Networks. Starting in 2009, he studied at the University of Stuttgart, getting his M.Sc. in electrical engineering in 2014. From 2014 to 2019, he developed DNN compression methods as a research assistant at the Institute for Signal Processing and System Theory (ISS) at the University of Stuttgart.

Technology 7
Mixed Precision Quantization of Deep Neural Networks

In order to deploy AI applications to edge devices, it is essential to reduce DNNs footprints. Sony R&D Center is researching and developing methods and tool for this purpose. In particular, we have been investigating new approaches for training quantized DNN which leads to a reduction of the memory, computations and energy footprints. For example, our paper presented at the ICLR2020 conference entitled "Mixed Precision DNNs: All you need is a good parametrization" shows how we can train DNNs to optimally distribute the bitwidths across layers given a specific memory budget. The resulting mixed precision MobileNetV2 allows to reduce by nearly 10x the required memory without significant loss of accuracy on the ImageNet classification benchmark.

  • Fabien Cardinaux

    Sony R&D Center Europe, Germany

    Dr. Fabien Cardinaux is leading a R&D team at the Sony R&D Center Europe in Stuttgart (Germany). Prior to joining Sony in 2011, he has worked as a Postdoc at the University of the Sheffield (UK). In 2005, he obtained a PhD from EPFL (Switzerland) for his work on machine learning methods applied to face authentication. His current research interests lie in deep neural network footprint reduction, neural architecture search and audio content creation. Fabien contributes to academic research by regularly publishing and reviewing for major machine learning conferences.

Technology 8
Training of Extremely Large-scale Neural Networks Beyond GPU Memory Limitation

While large neural networks demonstrate higher performance in various tasks, training large networks is difficult due to limitations on GPU memory size. We propose a novel out-of-core algorithm that enables faster training of extremely large-scale neural networks with sizes larger than allotted GPU memory. Under a given memory budget constraint, our scheduling algorithm locally adapts the timing of memory transfers according to memory usage of each function, which improves overlap between computation and memory transfers. Additionally, we apply virtual addressing technique, commonly performed in OS, to training of neural networks with out-of-core execution, which drastically reduces the amount of memory fragmentation caused by frequent memory transfers. Our algorithm successfully trains several models with larger batchsize with faster computation than the state-of-the-art method.

  • Akio Hayakawa

    Sony R&D Center Tokyo, Japan

    He received master degree from The University of Tokyo in 2018, after which he joined Sony. He has been working on development of Sony's deep learning framework Neural Network Libraries, especially optimizing performance of core algorithm including GPU computation, graph engine and memory allocation. He has also been conducting core research for contents generation to support sony's entertainment activity.

Business use case

Case 1
Sony's World's First Intelligent Vision Sensors with AI Processing Functionality Enabling High-Speed Edge AI Processing and Contributing to Building of Optimal Systems Linked with the Cloud

Sony Corporation announced two models of intelligent vision sensors, the first image sensors in the world to be equipped with AI processing functionality. Including AI processing functionality on the image sensor itself enables high-speed edge AI processing and extraction of only the necessary data, which, when using cloud services, reduces data transmission latency, addresses privacy concerns, and reduces power consumption and communication costs.

Fig. Intelligent Vision Sensor

Case 2
Image Recognition Technology on aibo
~ Building Eyes of aibo to Live in Homes ~

aibo uses its eyes (cameras) to recognize the surroundings and modifies the behavior based on what it sees. It identifies owners and detects friends (other aibos) and toys to play with. Edge-friendly lightweight algorithm based on deep learning allows to recognize rapidly to interact with people. With the depth sensor, it can reach to the toys (ex. Pink balls/bones) while avoiding obstacles. A camera on the back realizes SLAM for building the map of the room and it can be used for go-to-charger and patrol application.

Fig. Sensors/Display device on aibo

Fig. Image Recognition Technology on aibo

Case 3
Neural Network Console

Neural Network Console is an integrated development environment for deep learning that enables full-scale research and development on GUI. This software have already been utilized in many products and services within Sony group for improving productivity of research and development of deep learning, as well as for effective human resource development of deep learning whose demand has rapidly expanded in recent years.

Fig. The New Deep Learning Experience


  • Francesco Cartella, Orlando Anunciação, Yuki Funabiki, Daisuke Yamaguchi, Toru Akishita, Olivier Elshocht


    Guaranteeing the security of transactional systems is a crucial priority of all institutions that process transactions, in order to protect their businesses against cyberattacks and fraudulent attempts. Adversarial attacks are novel techniques that, other than being proven to be effective to fool image classification models, can also be applied to tabular data. Adversarial attacks aim at producing adversarial examples, in other words, slightly modified inputs that induce the Artificial Intelligence (AI) system to return incorrect outputs that are advantageous for the attacker. In this paper we illustrate a novel approach to modify and adapt state-of-the-art algorithms to imbalanced tabular data, in the context of fraud detection. Experimental results show that the proposed modifications lead to a perfect attack success rate, obtaining adversarial examples that are also less perceptible when analyzed by humans. Moreover, when applied to a real-world production system, the proposed techniques shows the possibility of posing a serious threat to the robustness of advanced AI-based fraud detection procedures.

paper by member of Sony AI

  • Torgyn Shaikhina, Umang Bhatt, Roxanne Zhang, Konstantinos Georgatzis, Alice Xiang and Adrian Weller


    Post-hoc feature importance scores are one method for explaining machine learning model outputs. In this paper, we explore the effects of uncertainty on the quality of these explanations. Specifically, we develop an approach to quantitatively connect the uncertainty in model predictions with the variance, complexity, monotonicity, efficiency, and faithfulness of the explanations. We find that feature importance explanations for out-of-distribution data perform poorly on these evaluation criteria. We further demonstrate that uncertainty in predictions among a set of candidate models—identical in model specification but obtained by subsampling from the training set—propagates to uncertainty in the feature importance scores, resulting in arbitrary explanations for a given sample. We analyze the effect of the number of candidate models and subsample size on measures of feature importance. Our findings suggest that in the presence of uncertainty, current feature importance techniques are unreliable.

  • William Macke, Reuth Mirsky, and Peter Stone


    A desirable goal for autonomous agents is to be able to coordinate on the fly with previously unknown teammates. Known as "ad hoc teamwork", enabling such a capability has been receiving increasing attention in the research community. One of the central challenges in ad hoc teamwork is quickly recognizing the current plans of other agents and planning accordingly. In this paper, we focus on the scenario in which teammates can communicate with one another, but only at a cost. Thus, they must carefully balance plan recognition based on observations vs. that based on communication. This paper proposes a new metric for evaluating how similar are two policies that a teammate may be following - the Expected Divergence Point (EDP). We then present a novel planning algorithm for ad hoc teamwork, determining which query to ask and planning accordingly. We demonstrate the effectiveness of this algorithm in a range of increasingly general communication in ad hoc teamwork problems.

  • Yuqian Jiang, Sudarshanan Bharadwaj, Bo Wu, Rishi Shah, Ufuk Topcu, and Peter Stone


    In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation. As usual, learning an optimal policy in this setting typically requires a large amount of training experiences. Reward shaping is a common approach for incorporating domain knowledge into reinforcement learning in order to speed up convergence to an optimal policy. However, to the best of our knowledge, the theoretical properties of reward shaping have thus far only been established in the discounted setting. This paper presents the first reward shaping framework for average-reward learning and proves that, under standard assumptions, the optimal policy under the original reward function can be recovered. In order to avoid the need for manual construction of the shaping function, we introduce a method for utilizing domain knowledge expressed as a temporal logic formula. The formula is automatically translated to a shaping function that provides additional reward throughout the learning process. We evaluate the proposed method on three continuing tasks. In all cases, shaping speeds up the average-reward learning rate without any reduction in the performance of the learned policy compared to relevant baselines.

  • Yu-Sian Jiang, Garret Warnell, and Peter Stone


    Human-robot shared autonomy techniques for vehicle navigation hold promise for reducing a human driver's workload, ensuring safety, and improving navigation efficiency. However, because typical techniques achieve these improvements by effectively removing human control at critical moments, these approaches often exhibit poor responsiveness to human commands—especially in cluttered environments. In this paper, we propose a novel goal-blending shared autonomy (GBSA) system, which aims to improve responsiveness in shared autonomy systems by blending human and robot input during the selection of local navigation goals as opposed to low-level motor (servo-level) commands. We validate the proposed approach by performing a human study involving an intelligent wheelchair and compare GBSA to a representative servo-level shared control system that uses a policy-blending approach. The results of both quantitative performance analysis and a subjective survey show that GBSA exhibits significantly better system responsiveness and induces higher user satisfaction than the existing approach.

  • Peter Stone


    Reactions such as gestures, facial expressions, and vocalizations are an abundant, naturally occurring channel of information that humans provide during interactions. A robot or other agent could leverage an understanding of suchim-plicithuman feedback to improve its task performance at no cost to the human. This approach contrasts with common agent teaching methods based on demon-strations, critiques, or other guidance that need to be attentively and intentionally provided. In this paper, we first define the general problem of learning from im-plicit human feedback and then propose to address this problem through a novel data-driven framework, EMPATHIC. This two-stage method consists of (1) map-ping implicit human feedback to relevant task statistics such as reward, optimality, and advantage; and (2) using such a mapping to learn a task. We instantiate the first stage and three second-stage evaluations of the learned mapping. To do so, we collect a dataset of human facial reactions while participants observe an agent execute a sub-optimal policy for a prescribed training task. We train a deep neural network on this data and demonstrate its ability to (1) infer relative reward ranking of events in the training task from prerecorded human facial reactions; (2) improve the policy of an agent in the training task using live human facial reactions; and (3) transfer to a novel domain in which it evaluates robot manipulation trajectories.

  • Peter Stone

    Invited talk summary

    As autonomous agents proliferate in the real world, both in software and robotic settings, they will increasingly need to band together for cooperative activities with previously unfamiliar teammates. In such "ad hoc" team settings, team strategies cannot be developed a priori. Rather, an agent must be prepared to cooperate with many types of teammates: it must collaborate without pre-coordination. This talk will cover past and ongoing research on the challenge of building autonomous agents that are capable of robust ad hoc teamwork.

Page Top