SONY

NeurIPS | 2020

Thirty-fourth Conference on Neural Information Processing Systems

The purpose of the Neural Information Processing Systems annual meeting is to foster the exchange of research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects.
The core focus is peer-reviewed novel research which is presented and discussed in the general session, along with invited talks by leaders in their field.

Sun. December 6th, 2020 through Sat. the 12th
(NeurIPS 2020 is a Virtual-only Conference)

Sony would like to extend our heartfelt condolences to those who have died from the COVID-19 virus and their families and pray for the speedy recovery of those currently battling the disease. As many of you already know, NeurIPS (Neural Information Processing Systems) 2020 is going virtual. Sony salutes the NeurIPS-2020 and all general chairs for prioritizing the need to protect people's lives with its decision to convert the conference to a virtual environment, and we thank the staff at the organizing committee for their swift preparations. As one of the sponsors of the event, Sony intended to hold sessions and host a technology exhibition in the sponsors booth. It was very unfortunate that we couldn't meet you all in Vancouver but as an alternative we intend to introduce several of Sony's latest combined AI, Sensor and Computer Vision technologies on this site, including some which are still in the development stage. While it may not be the same as meeting in person, we are keen to contribute to the first virtual conference in any way that we can. We NeurIPS-2020 participants readily join the global battle to defeat COVID-19 and, once this global health emergency has passed, we look forward to seeing you all in person next year on site.

Technologies

Technology 1
Reinforcement Learning for Optimization of COVID-19 Mitigation Policies

The year 2020 has seen the COVID-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world are faced with the challenge of protecting public health, while keeping the economy running to the greatest extent possible. Epidemiological models provide insight into the spread of these types of diseases and predict the effects of possible intervention policies. However, to date, the most data-driven intervention policies rely on heuristics. In this research, we study how reinforcement learning (RL) can be used to optimize mitigation policies that minimize the economic impact without overwhelming the hospital capacity. Our main contributions are (1) a novel agent-based pandemic simulator which, unlike traditional models, is able to model fine-grained interactions among people at specific locations in a community; and (2) an RL-based methodology for optimizing fine-grained mitigation policies within this simulator. Our results validate both the overall simulator behavior and the learned policies under realistic conditions.

Joint research by Varun Kompella, Roberto Capobianco, Stacy Jong, Jonathan Browne, Spencer Fox, Lauren Meyers, Peter Wurman, and Peter Stone.

  • Peter Stone

    Sony AI America

    Professor Peter Stone is the founder and director of the Learning Agents Research Group (LARG) within the Artificial Intelligence Laboratory in the Department of Computer Science at The University of Texas at Austin, as well as associate department chair and Director of Texas Robotics. He was a co-founder of Cogitai, Inc. and is now Executive Director of Sony AI America. His main research interest in AI is understanding how they can best create complete intelligent agents. He consider adaptation, interaction, and embodiment to be essential capabilities of such agents. Thus, his research focuses mainly on machine learning, multiagent systems, and robotics. To him, the most exciting research topics are those inspired by challenging real-world problems. He believe that complete successful research includes both precise, novel algorithms and fully implemented and rigorously evaluated applications. His application domains have included robot soccer, autonomous bidding agents, autonomous vehicles, and human-interactive agents.

Technology 2
Hypotheses Generation for Applications in Biomedicine and Gastronomy

Hypothesis generation is the problem of discovering meaningful, implicit connections in a particular domain. We focus on two application areas 1) biomedicine and the discovery of new connections between scientific terms such as diseases, chemicals, drugs, genes, 2) food pairing for discovering new connections between ingredients, taste and flavor molecules. Sony AI and its academic partners have developed a variety of models that explore representation learning and novel link prediction models for these tasks. In the biomedical domain, we developed models able to leverage temporal data about how connections between concepts have emerged over the last 80 years. In the food domain, we deal with multi-partite graphs that link ingredients with molecule information and health aspects of ingredients. The talk will introduce hypothesis generation as a graph embedding representation learning and link prediction task. We'll present recently published models that integrate 1) variational inference for estimating priors, 2) graph embedding learning regimes and 3) application of embeddings in training ranking models.

  • Michael Spranger

    Sony AI Tokyo, Japan

    Dr. Michael Spranger is the COO of Sony AI Inc., Sony's strategic research and development organization established April 2020. Sony AI's mission is to "unleash human imagination and creativity with AI." Michael is a roboticist by training with extensive research experience in fields such as Natural Language Processing, robotics, and foundations of Artificial Intelligence. Michael has published more than 60 papers at top AI conferences such as IJCAI, NeurIPS and others. Concurrent to Sony AI, Michael also holds a researcher position at Sony Computer Science Laboratories, Inc., and is actively contributing to Sony's overall AI Ethics strategy.

Technology 3
Sensing, AI, and Robotics at Sony AI

In this short video, I will introduce our approach to combining sensing, AI, and robotics at Sony AI.

  • Peter Duerr

    Sony AI Zürich, Switzerland

    Dr. Peter Duerr is the Director of Sony AI in Zurich. After joining Sony in 2011 he worked on computer vision, AI and robotics research in various assignments at Sony R&D Center and Aerosense in Tokyo, at Sony R&D Center Europe, and recently Sony AI in Zürich. Peter holds an MSc in mechanical engineering from ETH Zürich and a PhD in computer and communication science from EPFL in Lausanne.

Technology 4
Content Restoration/Protection/Generation Technologies

The first work is titled "D3Net: Densely connected multidilated DenseNet for music source separation". Dense prediction tasks, such as semantic segmentation and audio source separation, involve a high-resolution input and prediction. In order to efficiently model local and global structure of data, we propose a dense multiresolution learning by combining a dense skip connection topology and a novel multidilated convolution. The proposed D3Net achieves state-of-the-art performance on a music source separation task, where the goal is separate individual instrumental sounds from a music. Demo is available in another video, so please check it out!

In the second paper, we investigate the adversarial attack on audio source separation problem. We found that it is possible to severely degrade the separation performance by adding imperceptible noise to the input mixture under a white box condition, while under a black box condition, source separation methods exabit certain level of robustness. We believe that this work is important for understanding the robustness of source separation models, as well as for developing content protection methods against the abuse of separated signals.

The last paper is on an investigation of posterior collapse in Gaussian VAE. VAE is one of the famous generative models for its tractability and stability of training, but it suffers from the problem of posterior collapse. We've investigated the cause of posterior collapse in Gaussian VAE from the viewpoint of local Lipschitz smoothness. Then, we proposed a modified ELBO-based objective function which adapts hyper-parameter in ELBO automatically. Our new objective function enables to prevent the over-smoothing of the decoder, or posterior collapse.

  • Naoya Takahashi

    Sony R&D Center Tokyo, Japan

    He received Ph.D. from University of Tsukuba, Japan, in 2020. From 2015 to 2016, he had worked at the Computer Vision Lab and Speech Processing Group at ETH Zurich as a visiting researcher. Since he joined Sony Corporation in 2008, he has performed research in audio, computer vision and machine learning domains. In 2018, he won the Sony Outstanding Engineer Award, which is the highest form of individual recognition for Sony Group engineers. His current research interests include audio source separation, semantic segmentation, video highlight detection, event detection, speech recognition and music technologies.

Technology 5
Video Colorization for Content Revitalization

Sony is highly interested in revitalizing old contents, and video colorization is one of such efforts. However, video colorization is a challenging task with temporal coherence and user controllability issues. We introduce our unique reference-based video colorization and demonstrate that it can alleviate the issues described above, helping revitalize the old black-and-white videos.

  • Andrew Shin

    Sony R&D Center Tokyo, Japan

    He received Ph.D from The University of Tokyo in 2017, after which he joined Sony. He has been working on development of Sony's deep learning framework Neural Network Libraries, as well as developing machine learning tools to support contents creation for entertainment business, and conducting core research.

  • Naofumi Akimoto

    Sony R&D Center Tokyo, Japan

    He received master degree of engineering from Keio University in 2020, after which he joined Sony. He has been working on research for machine learning technologies for image and video enhancement for Sony's entertainment business.

Technology 6
Neural Architecture Search for Automating The Design of Deep Neural Networks

The application of Deep Neural Networks (DNNs), typically involves a large amount of network engineering. In practice, this is a laborious and very time consuming task that must be performed with great care in order to get the maximum performance and efficiency. In particular, it requires much expertise, experience and intuition. At Sony, we are very interested in hardware aware Neural Architecture Search (NAS) algorithms to automate the architecture design process. Hardware aware means that we do not only optimize the DNN performance, but also enforce given hardware constraints like memory, power or latency constraints. In this presentation, we present NNablaNAS - Sony’s new open source Neural Architecture Search (NAS) toolbox for hardware aware NAS.

  • Lukas Mauch

    Sony R&D Center Europe, Germany

    He is with the Sony R&D center Europe (Germany) since 2019, working on efficient inference methods for Deep Neural Networks. Starting in 2009, he studied at the University of Stuttgart, getting his M.Sc. in electrical engineering in 2014. From 2014 to 2019, he developed DNN compression methods as a research assistant at the Institute for Signal Processing and System Theory (ISS) at the University of Stuttgart.

Technology 7
Mixed Precision Quantization of Deep Neural Networks

In order to deploy AI applications to edge devices, it is essential to reduce DNNs footprints. Sony R&D Center is researching and developing methods and tool for this purpose. In particular, we have been investigating new approaches for training quantized DNN which leads to a reduction of the memory, computations and energy footprints. For example, our paper presented at the ICLR2020 conference entitled "Mixed Precision DNNs: All you need is a good parametrization" shows how we can train DNNs to optimally distribute the bitwidths across layers given a specific memory budget. The resulting mixed precision MobileNetV2 allows to reduce by nearly 10x the required memory without significant loss of accuracy on the ImageNet classification benchmark.

  • Fabien Cardinaux

    Sony R&D Center Europe, Germany

    Dr. Fabien Cardinaux is leading a R&D team at the Sony R&D Center Europe in Stuttgart (Germany). Prior to joining Sony in 2011, he has worked as a Postdoc at the University of the Sheffield (UK). In 2005, he obtained a PhD from EPFL (Switzerland) for his work on machine learning methods applied to face authentication. His current research interests lie in deep neural network footprint reduction, neural architecture search and audio content creation. Fabien contributes to academic research by regularly publishing and reviewing for major machine learning conferences.

Technology 8
Training of Extremely Large-scale Neural Networks Beyond GPU Memory Limitation

While large neural networks demonstrate higher performance in various tasks, training large networks is difficult due to limitations on GPU memory size. We propose a novel out-of-core algorithm that enables faster training of extremely large-scale neural networks with sizes larger than allotted GPU memory. Under a given memory budget constraint, our scheduling algorithm locally adapts the timing of memory transfers according to memory usage of each function, which improves overlap between computation and memory transfers. Additionally, we apply virtual addressing technique, commonly performed in OS, to training of neural networks with out-of-core execution, which drastically reduces the amount of memory fragmentation caused by frequent memory transfers. Our algorithm successfully trains several models with larger batchsize with faster computation than the state-of-the-art method.

  • Akio Hayakawa

    Sony R&D Center Tokyo, Japan

    He received master degree from The University of Tokyo in 2018, after which he joined Sony. He has been working on development of Sony's deep learning framework Neural Network Libraries, especially optimizing performance of core algorithm including GPU computation, graph engine and memory allocation. He has also been conducting core research for contents generation to support sony's entertainment activity.

Business use case

Case 1
Sony's World's First Intelligent Vision Sensors with AI Processing Functionality Enabling High-Speed Edge AI Processing and Contributing to Building of Optimal Systems Linked with the Cloud

Sony Corporation announced two models of intelligent vision sensors, the first image sensors in the world to be equipped with AI processing functionality. Including AI processing functionality on the image sensor itself enables high-speed edge AI processing and extraction of only the necessary data, which, when using cloud services, reduces data transmission latency, addresses privacy concerns, and reduces power consumption and communication costs.

Fig. Intelligent Vision Sensor

Case 2
Image Recognition Technology on aibo
~ Building Eyes of aibo to Live in Homes ~

aibo uses its eyes (cameras) to recognize the surroundings and modifies the behavior based on what it sees. It identifies owners and detects friends (other aibos) and toys to play with. Edge-friendly lightweight algorithm based on deep learning allows to recognize rapidly to interact with people. With the depth sensor, it can reach to the toys (ex. Pink balls/bones) while avoiding obstacles. A camera on the back realizes SLAM for building the map of the room and it can be used for go-to-charger and patrol application.

Fig. Sensors/Display device on aibo

Fig. Image Recognition Technology on aibo

Case 3
Neural Network Console

Neural Network Console is an integrated development environment for deep learning that enables full-scale research and development on GUI. This software have already been utilized in many products and services within Sony group for improving productivity of research and development of deep learning, as well as for effective human resource development of deep learning whose demand has rapidly expanded in recent years.

Fig. The New Deep Learning Experience

Publications

First author Sony AI

  • Oscar Chang (Sony AI and Columbia University), Lampros Flokas (Columbia University), Hod Lipson (Columbia University), Michael Spranger (Sony AI)

    Abstract

    SATNet is an award-winning MAXSAT solver that can be used to infer logical rules and integrated as a differentiable layer in a deep neural network. It had been shown to solve Sudoku puzzles visually from examples of puzzle digit images, and was heralded as an impressive achievement towards the longstanding AI goal of combining pattern recognition with logical reasoning. In this paper, we clarify SATNet's capabilities by showing that in the absence of intermediate labels that identify individual Sudoku digit images with their logical representations, SATNet completely fails at visual Sudoku (0% test accuracy). More generally, the failure can be pinpointed to its inability to learn to assign symbols to perceptual phenomena, also known as the symbol grounding problem, which has long been thought to be a prerequisite for intelligent agents to perform real-world logical reasoning. We propose an MNIST based test as an easy instance of the symbol grounding problem that can serve as a sanity check for differentiable symbolic solvers in general. Naive applications of SATNet on this test lead to performance worse than that of models without logical reasoning capabilities. We report on the causes of SATNet's failure and how to prevent them.

  • Uchenna Akujuobi (Sony AI and King Abdullah University of Science and Technology), Jun Chen (King Abdullah University of Science and Technology), Mohamed Elhoseiny (KAUST and Stanford University), Michael Spranger (Sony AI), Xiangliang Zhang (King Abdullah University of Science and Technology, Saudi Arabia)

    Abstract

    Understanding the relationships between biomedical terms like viruses, drugs, and symptoms is essential in the fight against diseases. Many attempts have been made to introduce the use of machine learning to the scientific process of hypothesis generation (HG), which refers to the discovery of meaningful implicit connections between biomedical terms. However, most existing methods fail to truly capture the temporal dynamics of scientific term relations and also assume unobserved connections to be irrelevant (i.e., in a positive-negative (PN) learning setting). To break these limits, we formulate this HG problem as future connectivity prediction task on a dynamic attributed graph via positive-unlabeled (PU) learning. Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data. We propose a variational inference model to estimate the positive prior, and incorporate it in the learning of node pair embeddings, which are then used for link prediction. Experiment results on real-world biomedical term relationship datasets and case study analyses on a COVID-19 dataset validate the effectiveness of the proposed model.

by Sony AI members

  • Lemeng Wu (UT Austin), Bo Liu (University of Texas at Austin), Peter Stone (Sony AI and The University of Texas at Austin), Qiang Liu (UT Austin)

    Abstract

    We propose firefly neural architecture descent, a general framework for progressively and dynamically growing neural networks to jointly optimize the networks' parameters and architectures. Our method works in a steepest descent fashion, which iteratively finds the best network within a functional neighborhood of the original network that includes a diverse set of candidate network structures. By using Taylor approximation, the optimal network structure in the neighborhood can be found with a greedy selection procedure. We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures that avoid catastrophic forgetting in continual learning. Empirically, firefly descent achieves promising results on both neural architecture search and continual learning. In particular, on a challenging continual image classification task, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.

  • Siddharth Desai (University of Texas at Austin), Ishan Durugkar (University of Texas at Austin), Haresh Karnan (University of Texas at Austin), Garrett Warnell (US Army Research Laboratory), Josiah Hanna ( University of Edinburgh), Peter Stone (Sony AI and The University of Texas at Austin)

    Abstract

    We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning. This problem is particularly important in sim-to-real transfer because simulators inevitably model real-world dynamics imperfectly. In this paper, we show that one existing solution to this transfer problem-- grounded action transformation --is closely related to the problem of imitation from observation (IfO): learning behaviors that mimic the observations of behavior demonstrations. After establishing this relationship, we hypothesize that recent state-of-the-art approaches from the IfO literature can be effectively repurposed for grounded transfer learning. To validate our hypothesis we derive a new algorithm -- generative adversarial reinforced action transformation (GARAT) -- based on adversarial imitation from observation techniques. We run experiments in several domains with mismatched dynamics, and find that agents trained with GARAT achieve higher returns in the target environment compared to existing black-box transfer methods.

Page Top