Search button in the site



International Joint Conferences on Artificial Intelligence

International Joint Conferences on Artificial Intelligence is a non-profit corporation founded in California, in 1969 for scientific and educational purposes, including dissemination of information on Artificial Intelligence at conferences in which cutting-edge scientific results are presented and through dissemination of materials presented at these meetings in form of Proceedings, books, video recordings, and other educational materials. IJCAI consists of two divisions: the Conference Division and the AI Journal Division. IJCAI conferences present premier international gatherings of AI researchers and practitioners and they were held biennially in odd-numbered years since 1969.Starting with 2016, IJCAI conferences are held annually.

Thu January 7th,2021 through Thu the 15th
(IJCAI-2020 is a Virtual-only Conference)

As many of you already know, IJCAI-2020 (the 29th International Joint Conferences on Artificial Intelligence) is going virtual. Sony intended to hold sessions and host a technology exhibition in the sponsors booth as one of the sponsors of the event. We intend to introduce several of Sony's latest combined AI, Sensor and Computer Vision technologies on this site, including some which are still in the development stage. While it may not be the same as meeting in person, we are keen to contribute to the virtual conference in any way that we can.


Technology 1
Reinforcement Learning for Optimization of COVID-19 Mitigation Policies

The year 2020 has seen the COVID-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world are faced with the challenge of protecting public health, while keeping the economy running to the greatest extent possible. Epidemiological models provide insight into the spread of these types of diseases and predict the effects of possible intervention policies. However, to date, the most data-driven intervention policies rely on heuristics. In this research, we study how reinforcement learning (RL) can be used to optimize mitigation policies that minimize the economic impact without overwhelming the hospital capacity. Our main contributions are (1) a novel agent-based pandemic simulator which, unlike traditional models, is able to model fine-grained interactions among people at specific locations in a community; and (2) an RL-based methodology for optimizing fine-grained mitigation policies within this simulator. Our results validate both the overall simulator behavior and the learned policies under realistic conditions.

Joint research by Varun Kompella, Roberto Capobianco, Stacy Jong, Jonathan Browne, Spencer Fox, Lauren Meyers, Peter Wurman, and Peter Stone.

  • Peter Stone

    Sony AI America

    Professor Peter Stone is the founder and director of the Learning Agents Research Group (LARG) within the Artificial Intelligence Laboratory in the Department of Computer Science at The University of Texas at Austin, as well as associate department chair and Director of Texas Robotics. He was a co-founder of Cogitai, Inc. and is now Executive Director of Sony AI America. His main research interest in AI is understanding how they can best create complete intelligent agents. He consider adaptation, interaction, and embodiment to be essential capabilities of such agents. Thus, his research focuses mainly on machine learning, multiagent systems, and robotics. To him, the most exciting research topics are those inspired by challenging real-world problems. He believe that complete successful research includes both precise, novel algorithms and fully implemented and rigorously evaluated applications. His application domains have included robot soccer, autonomous bidding agents, autonomous vehicles, and human-interactive agents.

Technology 2
Hypotheses Generation for Applications in Biomedicine and Gastronomy

Hypothesis generation is the problem of discovering meaningful, implicit connections in a particular domain. We focus on two application areas 1) biomedicine and the discovery of new connections between scientific terms such as diseases, chemicals, drugs, genes, 2) food pairing for discovering new connections between ingredients, taste and flavor molecules. Sony AI and its academic partners have developed a variety of models that explore representation learning and novel link prediction models for these tasks. In the biomedical domain, we developed models able to leverage temporal data about how connections between concepts have emerged over the last 80 years. In the food domain, we deal with multi-partite graphs that link ingredients with molecule information and health aspects of ingredients. The talk will introduce hypothesis generation as a graph embedding representation learning and link prediction task. We'll present recently published models that integrate 1) variational inference for estimating priors, 2) graph embedding learning regimes and 3) application of embeddings in training ranking models.

  • Michael Spranger

    Sony AI Tokyo, Japan

    Dr. Michael Spranger is the COO of Sony AI Inc., Sony's strategic research and development organization established April 2020. Sony AI's mission is to "unleash human imagination and creativity with AI." Michael is a roboticist by training with extensive research experience in fields such as Natural Language Processing, robotics, and foundations of Artificial Intelligence. Michael has published more than 60 papers at top AI conferences such as IJCAI, NeurIPS and others. Concurrent to Sony AI, Michael also holds a researcher position at Sony Computer Science Laboratories, Inc., and is actively contributing to Sony's overall AI Ethics strategy.

Technology 3
Sensing, AI, and Robotics at Sony AI

In this short video, I will introduce our approach to combining sensing, AI, and robotics at Sony AI.

  • Peter Duerr

    Sony AI Zürich, Switzerland

    Dr. Peter Duerr is the Director of Sony AI in Zurich. After joining Sony in 2011 he worked on computer vision, AI and robotics research in various assignments at Sony R&D Center and Aerosense in Tokyo, at Sony R&D Center Europe, and recently Sony AI in Zürich. Peter holds an MSc in mechanical engineering from ETH Zürich and a PhD in computer and communication science from EPFL in Lausanne.

Technology 4
Content Restoration/Protection/Generation Technologies

The first work is titled "D3Net: Densely connected multidilated DenseNet for music source separation". Dense prediction tasks, such as semantic segmentation and audio source separation, involve a high-resolution input and prediction. In order to efficiently model local and global structure of data, we propose a dense multiresolution learning by combining a dense skip connection topology and a novel multidilated convolution. The proposed D3Net achieves state-of-the-art performance on a music source separation task, where the goal is separate individual instrumental sounds from a music. Demo is available in another video, so please check it out!

In the second paper, we investigate the adversarial attack on audio source separation problem. We found that it is possible to severely degrade the separation performance by adding imperceptible noise to the input mixture under a white box condition, while under a black box condition, source separation methods exabit certain level of robustness. We believe that this work is important for understanding the robustness of source separation models, as well as for developing content protection methods against the abuse of separated signals.

The last paper is on an investigation of posterior collapse in Gaussian VAE. VAE is one of the famous generative models for its tractability and stability of training, but it suffers from the problem of posterior collapse. We've investigated the cause of posterior collapse in Gaussian VAE from the viewpoint of local Lipschitz smoothness. Then, we proposed a modified ELBO-based objective function which adapts hyper-parameter in ELBO automatically. Our new objective function enables to prevent the over-smoothing of the decoder, or posterior collapse.

  • Naoya Takahashi

    Sony R&D Center Tokyo, Japan

    He received Ph.D. from University of Tsukuba, Japan, in 2020. From 2015 to 2016, he had worked at the Computer Vision Lab and Speech Processing Group at ETH Zurich as a visiting researcher. Since he joined Sony Corporation in 2008, he has performed research in audio, computer vision and machine learning domains. In 2018, he won the Sony Outstanding Engineer Award, which is the highest form of individual recognition for Sony Group engineers. His current research interests include audio source separation, semantic segmentation, video highlight detection, event detection, speech recognition and music technologies.

Technology 5
Video Colorization for Content Revitalization

Sony is highly interested in revitalizing old contents, and video colorization is one of such efforts. However, video colorization is a challenging task with temporal coherence and user controllability issues. We introduce our unique reference-based video colorization and demonstrate that it can alleviate the issues described above, helping revitalize the old black-and-white videos.

  • Andrew Shin

    Sony R&D Center Tokyo, Japan

    He received Ph.D from The University of Tokyo in 2017, after which he joined Sony. He has been working on development of Sony's deep learning framework Neural Network Libraries, as well as developing machine learning tools to support contents creation for entertainment business, and conducting core research.

  • Naofumi Akimoto

    Sony R&D Center Tokyo, Japan

    He received master degree of engineering from Keio University in 2020, after which he joined Sony. He has been working on research for machine learning technologies for image and video enhancement for Sony's entertainment business.

Technology 6
Neural Architecture Search for Automating The Design of Deep Neural Networks

The application of Deep Neural Networks (DNNs), typically involves a large amount of network engineering. In practice, this is a laborious and very time consuming task that must be performed with great care in order to get the maximum performance and efficiency. In particular, it requires much expertise, experience and intuition. At Sony, we are very interested in hardware aware Neural Architecture Search (NAS) algorithms to automate the architecture design process. Hardware aware means that we do not only optimize the DNN performance, but also enforce given hardware constraints like memory, power or latency constraints. In this presentation, we present NNablaNAS - Sony’s new open source Neural Architecture Search (NAS) toolbox for hardware aware NAS.

  • Lukas Mauch

    Sony R&D Center Europe, Germany

    He is with the Sony R&D center Europe (Germany) since 2019, working on efficient inference methods for Deep Neural Networks. Starting in 2009, he studied at the University of Stuttgart, getting his M.Sc. in electrical engineering in 2014. From 2014 to 2019, he developed DNN compression methods as a research assistant at the Institute for Signal Processing and System Theory (ISS) at the University of Stuttgart.

Technology 7
Mixed Precision Quantization of Deep Neural Networks

In order to deploy AI applications to edge devices, it is essential to reduce DNNs footprints. Sony R&D Center is researching and developing methods and tool for this purpose. In particular, we have been investigating new approaches for training quantized DNN which leads to a reduction of the memory, computations and energy footprints. For example, our paper presented at the ICLR2020 conference entitled "Mixed Precision DNNs: All you need is a good parametrization" shows how we can train DNNs to optimally distribute the bitwidths across layers given a specific memory budget. The resulting mixed precision MobileNetV2 allows to reduce by nearly 10x the required memory without significant loss of accuracy on the ImageNet classification benchmark.

  • Fabien Cardinaux

    Sony R&D Center Europe, Germany

    Dr. Fabien Cardinaux is leading a R&D team at the Sony R&D Center Europe in Stuttgart (Germany). Prior to joining Sony in 2011, he has worked as a Postdoc at the University of the Sheffield (UK). In 2005, he obtained a PhD from EPFL (Switzerland) for his work on machine learning methods applied to face authentication. His current research interests lie in deep neural network footprint reduction, neural architecture search and audio content creation. Fabien contributes to academic research by regularly publishing and reviewing for major machine learning conferences.

Technology 8
Training of Extremely Large-scale Neural Networks Beyond GPU Memory Limitation

While large neural networks demonstrate higher performance in various tasks, training large networks is difficult due to limitations on GPU memory size. We propose a novel out-of-core algorithm that enables faster training of extremely large-scale neural networks with sizes larger than allotted GPU memory. Under a given memory budget constraint, our scheduling algorithm locally adapts the timing of memory transfers according to memory usage of each function, which improves overlap between computation and memory transfers. Additionally, we apply virtual addressing technique, commonly performed in OS, to training of neural networks with out-of-core execution, which drastically reduces the amount of memory fragmentation caused by frequent memory transfers. Our algorithm successfully trains several models with larger batchsize with faster computation than the state-of-the-art method.

  • Akio Hayakawa

    Sony R&D Center Tokyo, Japan

    He received master degree from The University of Tokyo in 2018, after which he joined Sony. He has been working on development of Sony's deep learning framework Neural Network Libraries, especially optimizing performance of core algorithm including GPU computation, graph engine and memory allocation. He has also been conducting core research for contents generation to support sony's entertainment activity.

Business use case

Case 1
Sony's World's First Intelligent Vision Sensors with AI Processing Functionality Enabling High-Speed Edge AI Processing and Contributing to Building of Optimal Systems Linked with the Cloud

Sony Corporation announced two models of intelligent vision sensors, the first image sensors in the world to be equipped with AI processing functionality. Including AI processing functionality on the image sensor itself enables high-speed edge AI processing and extraction of only the necessary data, which, when using cloud services, reduces data transmission latency, addresses privacy concerns, and reduces power consumption and communication costs.

Fig. Intelligent Vision Sensor

Case 2
Image Recognition Technology on aibo
~ Building Eyes of aibo to Live in Homes ~

aibo uses its eyes (cameras) to recognize the surroundings and modifies the behavior based on what it sees. It identifies owners and detects friends (other aibos) and toys to play with. Edge-friendly lightweight algorithm based on deep learning allows to recognize rapidly to interact with people. With the depth sensor, it can reach to the toys (ex. Pink balls/bones) while avoiding obstacles. A camera on the back realizes SLAM for building the map of the room and it can be used for go-to-charger and patrol application.

Fig. Sensors/Display device on aibo

Fig. Image Recognition Technology on aibo

Case 3
Neural Network Console

Neural Network Console is an integrated development environment for deep learning that enables full-scale research and development on GUI. This software have already been utilized in many products and services within Sony group for improving productivity of research and development of deep learning, as well as for effective human resource development of deep learning whose demand has rapidly expanded in recent years.

Fig. The New Deep Learning Experience


by Sony AI members

  • Reuth Mirsky, William Macke, Andy Wang, Harel Yedidsion, and Peter Stone


    In ad hoc teamwork, multiple agents need to collaborate without having knowledge about their teammates or their plans a priori. A common assumption in this research area is that the agents cannot communicate. However, just as two random people may speak the same language, autonomous teammates may also happen to share a communication protocol. This paper considers how such a shared protocol can be leveraged, introducing a means to reason about Communication in Ad Hoc Teamwork (CAT). The goal of this work is enabling improved ad hoc teamwork by judiciously leveraging the ability of the team to communicate. We situate our study within a novel CAT scenario, involving tasks with multiple steps, where teammates' plans are unveiled over time. In this context, the paper proposes methods to reason about the timing and value of communication and introduces an algorithm for an ad hoc agent to leverage these methods. Finally, we introduces a new multiagent domain, the tool fetching domain, and we study how varying this domain's properties affects the usefulness of communication. Empirical results show the benefits of explicit reasoning about communication content and timing in ad hoc teamwork.

  • Ishan Durugkar, Elad Liebman, and Peter Stone


    In multiagent reinforcement learning scenarios, it is often the case that independent agents must jointly learn to perform a cooperative task. This paper focuses on such a scenario in which agents have individual preferences regarding how to accomplish the shared task. We consider a framework for this setting which balances individual preferences against task rewards using a linear mixing scheme. In our theoretical analysis we establish that agents can reach an equilibrium that leads to optimal shared task reward even when they consider individual preferences which are not fully aligned with this task. We then empirically show, somewhat counter-intuitively, that there exist mixing schemes that outperform a purely task-oriented baseline. We further consider empirically how to optimize the mixing scheme.

Page Top