SONY

menu
Search button in the site

Search

Awards & Publications

Various results and achievements by Sony's technological developments

  • ALL
  • 2021
  • 2020
  • 2019
  • 2018-
more
ThemeAdversarial Attacks on Audio Source Separation
Academic ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
Technology CategoryAudio / Visual
AI / Robotics
NameN.Takahashi(Sony Corporation), S.Inoue(University of Tsukuba), Y.Mitsufuji(Sony Corporation)
Details

Despite the excellent performance of neural-network-based audio source separation methods and their wide range of applications, their robustness against intentional attacks has been largely neglected. In this work, we reformulate various adversarial attack methods for the audio source separation problem and intensively investigate them under different attack conditions and target models. We further propose a simple yet effective regularization method to obtain imperceptible adversarial noise while maximizing the impact on separation quality with low computational complexity. Experimental results show that it is possible to largely degrade the separation quality by adding imperceptibly small noise when the noise is crafted for the target model. We also show the robustness of source separation models against a black-box attack. This study provides potentially useful insights for developing content protection methods against the abuse of separated signals and improving the separation performance and robustness.

more
ThemeAll for One and One for All: Improving Music Separation by Bridging Networks
Academic ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
Technology CategoryAudio / Visual
AI / Robotics
NameR.Sawata, S.Uhlich, S.Takahashi, Y.Mitsufuji(Sony Corporation)
Details

This paper proposes several improvements for music separation with deep neural networks(DNNs), namely a multi-domain loss(MDL) and two combination schemes. First, by using MDL we take advantage of the frequency and time domain representation of audio signals. Next, we utilize the relationship among instruments by jointly considering them. We do this on the one hand by modifying the network architecture and introducing a CrossNet structure. On the other hand, we consider combinations of instrument estimates by using a new combination loss(CL). MDL and CL can easily be applied to many existing DNN-based separation methods as they are merely loss functions which are only used during training and which do not affect the inference step. Experimental results show that the performance of Open-Unmix (UMX), a well-known and state-of-the-art open source library for music separation, can be improved by utilizing our above schemes. Our modifications of UMX will be open-sourced together with this paper.

more
ThemeEnd-to-end lyrics Recognition with Voice to Singing Style Transfer
Academic ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
Technology CategoryAudio / Visual
AI / Robotics
NameS.Basak, S.Agarwal, S.Ganapathy(Indian Institute of Science), N.Takahashi(Sony Corporation)
Details

Automatic transcription of monophonic/polyphonic music is a challenging task due to the lack of availability of large amounts of transcribed data. In this paper, we propose a data augmentation method that converts natural speech to singing voice based on vocoder based speech synthesizer. This approach, called voice to singing (V2S), performs the voice style conversion by modulating the F0 contour of the natural speech with that of a singing voice. The V2S model based style transfer can generate good quality singing voice thereby enabling the conversion of large corpora of natural speech to singing voice that is useful in building an E2E lyrics transcription system. In our experiments on monophonic singing voice data, the V2S style transfer provides a significant gain (relative improvements of 21%) for the E2E lyrics transcription system. We also discuss additional components like transfer learning and lyrics based language modeling to improve the performance of the lyrics transcription system.

more
ThemeACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection
Academic ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
Technology CategoryAudio / Visual
AI / Robotics
NameK.Shimada, Y.Koyama, N.Takahashi, S.Takahashi, Y.Mitsufuji(Sony Corporation)
Details

Neural-network (NN)-based methods show high performance in sound event localization and detection (SELD). Conventional NN-based methods use two branches for a sound event detection (SED) target and a direction-of-arrival (DOA) target. The two-branch representation with a single network has to decide how to balance the two objectives during optimization. Using two networks dedicated to each task increases system complexity and network size. To address these problems, we propose an activity-coupled Cartesian DOA (ACCDOA) representation, which assigns a sound event activity to the length of a corresponding Cartesian DOA vector. The ACCDOA representation enables us to solve a SELD task with a single target and has two advantages: avoiding the necessity of balancing the objectives and model size increase. In experimental evaluations with the DCASE 2020 Task 3 dataset, the ACCDOA representation outperformed the two-branch representation in SELD metrics with a smaller network size. The ACCDOA-based SELD system also performed better than state-of-the-art SELD systems in terms of localization and location-dependent detection.

more
ThemeGaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition
Academic ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
Technology CategoryAudio / Visual
AI / Robotics
NameY.Kashiwagi, E.Tsunoo(Sony Corporation), S.Watanabe(Johns Hopkins University)
Details

Self-attention (SA) based models have recently achieved significant performance improvements in hybrid and end-to-end automatic speech recognition (ASR) systems owing to their flexible context modeling capability. However, it is also known that the accuracy degrades when applying SA to long sequence data. This is mainly due to the length mismatch between the inference and training data because the training data are usually divided into short segments for efficient training. To mitigate this mismatch, we propose a new architecture, which is a variant of the Gaussian kernel, which itself is a shift-invariant kernel. First, we mathematically demonstrate that self-attention with shared weight parameters for queries and keys is equivalent to a normalized kernel function. By replacing this kernel function with the proposed Gaussian kernel, the architecture becomes completely shift-invariant with the relative position information embedded using a frame indexing technique. The proposed Gaussian kernelized SA was applied to connectionist temporal classification (CTC) based ASR. An experimental evaluation with the Corpus of Spontaneous Japanese (CSJ) and TEDLIUM 3 benchmarks shows that the proposed SA achieves a significant improvement in accuracy (e.g., from 24.0% WER to 6.0% in CSJ) in long sequence data without any windowing techniques.

more
ThemeMaking Punctuation Restoration Robust and Fast with Multi-Task Learning and Knowledge Distillation
Academic ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
Technology CategoryAudio / Visual
AI / Robotics
NameM.Hentschel, E.Tsunoo, T.Okuda(Sony Corporation)

more
ThemeToward "KANDO" Creation with Immersive Visual Expression (Keynote Addresses)
Academic ConferenceThe 27th International Display Workshops (IDW)
Technology CategoryAudio / Visual
NameK. Nomoto

more
ThemeOptical See-Through AR HMD with Spatial Tracking (Invited)
Academic ConferenceLaser Display and Lighting Conference 2020
Technology CategoryAudio / Visual
NameH. Mukawa
Details

An optical see-through AR HMD prototype with a small temporal registration error between virtual and real objects was developed. The prototype employs micro-OLED displays to achieve high image quality despite the challenge to reduce a temporal registration error. Our latency compensation technique can minimize the temporal registration error small enough for practical use.

more
ThemeThe World Smallest OLED Microdisplay Projection Device Design Methodology
Academic ConferenceSID Display Week
Technology CategoryAudio / Visual
NameK. Itonaga, S. Sudo, J. Nishikawa, K. Kimura, H. Uchiyama, M. Yagi, R. Sawano, T. Matsuyama, K. Sasaki, K. Itatsu, T. Tsuchiya, Y. Nomura, Y. Sato
Details

We realized the world smallest OLED microdisplay projection device (W 10/D 11/H 6 (mm)). This device consists of just 2 parts which a high-brightness Micro OLED panel (1,000,000[cd/m2]) controlled the light divergence and an optimized lens (F/1.1) for the self-emitting projection device based on our fundamental studies.

more
ThemeWarp Square: A 360-degree Visual Experience in 4K Ultra-short-throw Projector Cave
Academic ConferenceLaser Display and Lightning Conference
Technology CategoryAudio / Visual
NameN. Ohse
Details

We have developed Warp Square, a 360-degree projector cave using 4K ultra-short-throw projectors and a contrast screen. It has high spatial efficiency and delivers high-quality images with a high contrast ratio. This system has been applied to various multi-person VR experiences. We expect that it will eventually be installed in all homes in the future.

more
ThemeUltrashort Throw Lenses with Catadioptric Relay Suitable for Flat and Curved Screens
Academic ConferenceThe International society for optics and photonics (SPIE) Optics and Photonics 2020
Technology CategoryAudio / Visual
NameJ. Nishikawa, M. Nishiyama
Details

In the past two decades, to realize high resolutions and low distortions while miniaturizing optical systems, various ultrashort throw lenses (USTs) have been proposed. In this work, ultrashort throw lenses with a catadioptric relay (USTCRs) with rotational symmetry are proposed to provide the solution. First, the initial design method of the USTCR and the optimized design solution are presented. Although this optical system with a throw ratio of 0.26:1 has resolutions and distortions equivalent to those of conventional UST with one aspherical mirror, the depth of focus (DOF) on the panel side becomes 1.31 times deeper with appropriate corrections of the field curvature. The total length of the lens is reduced to 0.89 times, the area of the main mirror is reduced to 0.22 times, and the number of lenses is reduced from 13 to 9 elements. Based on these results, several optical designs for USTCR have been tried and an optical design with robustness has been proposed. By optimizing the power of each lens group, the decenter sensitivity improved to 0.46 times for parallel decenter and 0.55 times for tilt decenter. By optimizing glass materials, the shift of focal plane can be suppressed to 0.56 times when the temperature increases by 30 °C. An USTCR suited for a curved screen has also been proposed. By introducing appropriate constraints that correct distortions, high resolutions and low distortions have been realized while maintaining a wide angle of view. USTCRs suited for various curved screens are currently underway.

more
ThemeImproving the architecture of the GaN VCSEL
Academic ConferenceCompoud Semicondoctor International conference
Technology CategoryImaging / Sensing
NameT. Hamaguchi, H. Nakajima, M. Tanaka, N. Kobayashi, T. Matou, M. Ito, T. Jyokawa, K. Hayashi, M. Ohara, H. Watanabe, Y. Hoshina, R. Koda, K. Yanashima
Details

This talk introduces the latest progress of Sony’s GaN-based visible VCSELs with features such as plane and curved mirrors made of dielectric materials. This novel class of GaN-based VCSELs allow small apertures down to 3 μm and long cavities of more than 20 μm without the occurrence of diffraction loss. These structures have enabled low threshold currents, high efficiency operation, and robust fabrication processes with high lasing yield. The proposed structure is facilitating the production of VCSELs formed on semi-polar plane GaN substrates and arrayed VCSELs, allowing green VCSELs and watt-class blue VCSEL arrays.

more
ThemeSuccessful Outcomes in a Stroop Test Modulate the Sense of Agency When the Human Response and the Preemptive Response Actuated by Electrical Muscle Stimulation are Aligned
Academic ConferenceVision sciences Society 2020
Technology CategoryHealth / Medical
NameD. Taijma(Sony Corporation), J. Nishida (Chicago University), P. Lopes (Chicago University), S. Kasahara (Sony Computer Science Laboratory, University of Tokyo)
Details

The sense of agency (SoA) refers to the sensation that I caused the action. Generally, one would expect that if moved passively by an external force, one would not feel an SoA. However, Kasahara (2018) have found that by using precise timing, the SoA was elicited even for EMS-actuated preemptive passive body movement in a simple reaction time task. This effect however only has been verified in the specific situation where the participant and the EMS device share the same goal of action. (i.e., both participant and device press a button as fast as possible). Here, we studied a more complex situation where a participant cooperates with an EMS-based device to perform a choice-task. In this case, device and participants’ answers will not always be aligned, e.g., at times the EMS-based device can choose the wrong answer or vice versa. We hypothesized that, If the underlying mechanism of the SoA would prefer to depend on the cognitive process of verifying the goal of own action and the outcome retrospectively, the apparent in/correct outcomes modulate the participant’s SoA. Participants performed two-alternative forced choice tasks of the Stroop-test by tapping with both hands and the EMS also actuated the participant’s hand to respond to the task, faster or slower than the participant’s voluntary movement. The EMS played two roles: assistive (forced success) or adversarial (forced failure). The result showed that, when the participant’s response and the EMS response were aligned, the SoA was significantly higher when the outcome was success rather than failure. In contrast, when their responses were not aligned, the SoA was elicited only when the outcome reflected the participant’s response regardless of the EMS’ response. These results support our hypothesis partly and imply that only when the outcome is sensed as one’s action, its outcome modulates the SoA postdictively.

more
ThemeRetargetable AR: Context-aware Augmented Reality in Various Scenes based on 3D Scene Graph
Academic ConferenceIEEE International Symposium on Mixed and Augmented Reality
Technology CategoryImmersive Experience
NameT. Tahara, G. Narita, T. Seno, T. Ishikawa
Details

In this paper, we present Retargetable AR, a novel AR framework that yields an AR experience that is aware of scene contexts set in various real environments, achieving natural interaction between the virtual and real worlds. To this end, we characterize scene contexts with relationships among objects in 3D space, not with coordinates transformations. A context assumed by an AR content and a context formed by a real environment where users experience AR are represented as abstract graph representations, i.e. scene graphs. From RGB-D streams, our framework generates a volumetric map in which geometric and semantic information of a scene are integrated. Moreover, using the semantic map, we abstract scene objects as oriented bounding boxes and estimate their orientations. With such a scene representation, our framework constructs, in an online fashion, a 3D scene graph characterizing the context of a real environment for AR. The correspondence between the constructed graph and an AR scene graph denoting the context of AR content provides a semantically registered content arrangement, which facilitates natural interaction between the virtual and real worlds. We performed extensive evaluations on our prototype system through quantitative evaluation of the performance of the oriented bounding box estimation, subjective evaluation of the AR content arrangement based on constructed 3D scene graphs, and an online AR demonstration. The results of these evaluations showed the effectiveness of our framework, demonstrating that it can provide a context-aware AR experience in a variety of real scenes. 

more
ThemeTutorial 8 : Point Cloud Compression in MPEG
Academic ConferenceIEEE International Conference on Image Processing
Technology CategoryAudio / Visual
NameD. Graziosi, O. Nakagami
Details

Consumer and industry level 3D sensing devices are becoming more common than ever before, increasing the amount of available 3D data. 3D scans can capture the full geometry and details of a 3D scene, and are useful in many applications including virtual reality, 3D video, robotics and geographic information access. Among many representation formats for 3D data, point clouds are a tradeoff between the easiness of acquisition, realistic rendering, facility in manipulation and processing. However, point clouds are typically represented by extremely large amounts of data, which is a significant barrier for mass market applications. To address this challenge, the Moving Pictures Experts Group (MPEG) initiated a standardization activity on Point Cloud Compression (PCC).This tutorial introduces the technologies developed during the MPEG standardization process for defining an international standard for point cloud compression. The diversity of point clouds in terms of density conducted to the design of two approaches: the first ne, called V-PCC (Video based Point Cloud Compression) consists in projecting the 3D space into a set of 2D patches and encodes them by using traditional video technologies. The second one, called G-PCC (Geometry based Point Cloud Compression) is traversing directly the 3D space in order to create the predictors.With the current V-PCC encoder implementation providing a compression of 125:1, a dynamic point cloud of 1 million points could be encoded at 8 Mbit/s with good perceptual quality. For the second approach, the current implementation of a lossless, intra-frame G PCC encoder provides a compression ratio up to 10:1 and acceptable quality lossy coding of ratio up to 35:1.By providing high-level immersiveness at currently available bandwidths, the two MPEG standards are expected to enable several applications such as six Degrees of Freedom (6 DoF) immersive media, virtual reality (VR) / augmented reality (AR), immersive real-time communication, autonomous driving, cultural heritage, and a mix of individual point cloud objects with background 2D/360-degree video. 

more
ThemeVideo-based Point Cloud Compression Standardization
Academic ConferenceInternational Conference on 3D Immersion 2020
Technology CategoryAudio / Visual
NameD. Graziosi
Details

Keynote "IMMERSIVE MEDIA: STANDARDS AND CHALLENGES"

more
ThemeAnalytic error control methods for efficient rotation in dynamic binaural rendering of Ambisonics
Academic ConferenceThe Journal of the Acoustical Society of America
Technology CategoryAudio / Visual
NameT. Magariyachi, Y. Mitsufuji
Details

Dynamic binaural rendering of Ambisonics considering head movements gives a highly realistic sensation to listeners owing to the precise localization and the presence of dynamic cues. Dealing with a head movement is often achieved in the spherical harmonic domain by multiplying Ambisonic signals by a Wigner D-matrix (WDM) with the aim of rotating signals in the opposite direction to the head movement. However, for a vertical rotation, the system requires an enormous computational cost owing to the structure of the WDM, whose number of block diagonal elements increases with the spherical harmonic order of Ambisonics. In this paper, a method is introduced to reduce the computational cost related to the vertical rotation by approximating a WDM with a banded WDM generated from the truncated sum of a power series expression of the WDM. By using an analytically derived upper bound of the approximation error, two methods are devised to determine the minimum bandwidth which archives the maximum computational cost reduction under the user-preferred threshold. The experimental results show that there is a trade-off between the approximation error and the computational cost and that these methods are applicable to the use case of interest, i.e., dynamic binaural rendering of Ambisonics.

more
ThemeSound Quality Improvement of MPEG-H 3D Audio Encoder
Academic Conference149th AES Convention
Technology CategoryAudio / Visual
NameA. Kono, H. Honma, T. Chinen
Details

In 2019, Sony launched 360 Reality Audio, which provides a new music experience using object-based spatial audio technology. Object-based audio contains information on time-varying object loudness and location and audio data, which are transmitted to playback devices, and then rendered and played back. It was reported in [1] that object locations affect the subjective sound pressure perception depending on the direction of the sound source. In this e-brief, we present an approach to increase the sound quality by considering the loudness and locations of objects. We perform a subjective listening test for three test items. The results indicate that two items had statistically significant differences in sound quality.

more
ThemeDirectional Dependency of Subjective Sound Pressure Perception on Three-Dimensional Sound
Academic Conference148th AES Convention
Technology CategoryAudio / Visual
NameA. Nakai, M. Tsuji, T. Chinen
Details

Sony launched 360 Reality Audio in 2019 that provides a new music experience using object-based spatial audio technology. In this music experience, sounds arrive from various directions. The direction from which the sound arrives affects the subjective sound pressure sensitivity [1, 2, 3]. Sivonen measured the subjective sound pressure sensitivity in seven directions on the left half of the horizontal plane and on the upper front quarter in the median plane [1]. In this e-brief, the number of directions is increased to 31, which includes a whole horizontal plane and the elevations lower than the front center for measurements with three band-limited signals (i.e., 93 conditions in total). As a result, 74 conditions are observed with statistically significant differences.

more
ThemeImproving Voice Separation by Incorporating End-to-End Speech Recognition
Academic ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing
Technology CategoryAudio / Visual
NameN. Takahashi, M. Kumar Singh (Indian Institute of Technology Bombay), S. Basak (Indian Institute of Science), P. Sudarsanam (Sony India Software Centre), S. Ganapathy (Indian Institute of Science), Y. Mitsufuji
Details

Despite recent advances in voice separation methods, many challenges remain in realistic scenarios such as noisy recording and the limits of available data. In this work, we propose to explicitly incorporate the phonetic and linguistic nature of speech by taking a transfer learning approach using an end-to-end automatic speech recognition (E2EASR) system. The voice separation is conditioned on deep features extracted from E2EASR to cover the long-term dependence of phonetic aspects. Experimental results on speech separation and enhancement task on the AVSpeech dataset show that the proposed method significantly improves the signal-to-distortion ratio over the baseline model and even outperforms an audio visual model, that utilizes visual information of lip movements.

more
ThemeArray-Geometry-Aware Spatial Active Noise Control Based on Direction-of-Arrival Weighting
Academic ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing
Technology CategoryAudio / Visual
NameY. Maeno, Y. Takida, N. Murata, Y. Mitsufuji
Details

Active noise control (ANC) over a sizeable space ideally requires uniformly distributed sensors and secondary sources, which limits the feasibility of practically realizing such systems. In this paper, we propose a direction of arrival (DOA) weighting algorithm for the adaptive filter update, which prioritizes residual error control with respect to the array geometry. Array geometries utilizing multiple horizontal rings, which are considered as more practical than spherical array geometries, are introduced into both sensors and secondary sources. Numerical simulations indicate that the proposed method using multiple-horizontal-ring arrays gives higher noise attenuation performance than the conventional method. The DOA weighting can be intuitively defined on the basis of the secondary source array geometry without any prior information of the primary noise field.

more
ThemeMetric Learning with Background Noise Class for Few-shot Detection of Rare Sound Events
Academic Conference45th International Conference on Acoustics, Speech, and Signal Processing
Technology CategoryAudio / Visual
NameK. Shimada, Y. Koyama, A. Inoue
Details

Few-shot learning systems for sound event recognition have gained interests since they require only a few examples to adapt to new target classes without fine-tuning. However, such systems have only been applied to chunks of sounds for classification or verification. In this paper, we aim to achieve few-shot detection of rare sound events, from query sequence that contain not only the target events but also the other events and background noise. Therefore, it is required to prevent false positive reactions to both the other events and background noise. We propose metric learning with background noise class for the few-shot detection. The contribution is to present the explicit inclusion of background noise as an independent class, a suitable loss function that emphasizes this additional class, and a corresponding sampling strategy that assists training. It provides a feature space where the event classes and the background noise class are sufficiently separated. Evaluations on few-shot detection tasks, using DCASE 2017 task2 and ESC-50, show that our proposed method outperforms metric learning without considering the background noise class. The few-shot detection performance is also comparable to that of the DCASE 2017 task2 baseline system, which requires huge amount of annotated audio data.

more
ThemePrimary Protection with Antenna Rotation Prediction for Dynamic Spectrum Access
Academic ConferenceInternational Symposium on Wireless Personal Multimedia Communications
Technology Category5G / IoT
NameH. Kuriki, K. Onose, R. Kimura, R. Sawai
Details

In this paper we propose a new primary protection method for dynamic spectrum access, in which a secondary system uses a frequency band assigned to a primary system. Following an investigation in Japan, we consider a scenario where a primary system’s transmission (Tx) station moves along a predefined course or area, and a reception (Rx) station keeps facing its antenna boresight towards the moving Tx station. The proposed method takes the movement of the Tx station into account and predicts a range of the variation of the Rx station’s antenna boresight for accurate interference calculation. The accurate interference calculation brings more operational opportunities to the secondary system under the scenario. We conducted computer simulations to demonstrate the effectiveness of the proposed method in practical urban scenarios and radio propagation models. The simulation results show that the proposed method can increase the number of available secondary base stations by 11.0 times compared to a conventional method.

more
ThemeArea-based Primary Protection with Antenna Rotation Prediction for Dynamic Spectrum Access
Academic ConferenceInternational Conference on Emerging Technologies for Communications
Technology Category5G / IoT
NameH. Kuriki, K. Onose, R. Kimura, R. Sawai
Details

In this paper we propose a new primary protection for dynamic spectrum access, in which a secondary system uses a frequency band assigned to a primary system. Following an investigation in Japan, we consider a scenario where location information of a primary system’s reception (Rx) station facing its antenna boresight towards a moving transmission station is ambiguous. The proposed method performs the antenna rotation prediction and aggregate interference calculation at each protection point defined within an area where the Rx station may be installed. This area-based primary protection brings more operational opportunities to the secondary system under the scenario. We conducted computer simulations to demonstrate the effectiveness of the proposed method in practical urban scenarios and radio propagation models. The simulation results show that the proposed method can increase the number of available secondary base stations by about 24.4 times compared to a conventional method.

more
Theme3D-CNN Based Heuristic Guided Task-Space Planner for Faster Motion Planning
Academic ConferenceInternational Conference on Robotics and Automation
Technology CategoryAI / Robotics
NameR. Terasawa, Y. Ariki, T. Narihira, T. Tsuboi, K. Nagasaka
Details

Motion planning is important in a wide variety of applications such as robotic manipulation. However, it is still challenging to reliably find a collision-free path within a reasonable time. To address the issue, this paper proposes a novel framework which combines a sampling-based planner and deep learning for faster motion planning, focusing on heuristics. The proposed method extends Task-Space Rapidlyexploring Random Trees (TS-RRT) to guide the trees with a "heuristic map" where every voxel has a cost-to-go value toward the goal. It also utilizes fully convolutional neural networks (CNNs) for producing more appropriate heuristic maps, rather than manually-designed heuristics. To verify the effectiveness of the proposed method, experiments for motion planning using a real environment and mobile manipulator are carried out. The results indicate that it outperforms the existing planners, especially in terms of the average planning time with smaller variance.

more
ThemeTheoretical derivation and realization of adaptive grasping based on rotational incipient slip detection
Academic ConferenceInternational Conference on Robotics and Automation
Technology CategoryAI / Robotics
NameT. Narita, S. Nagakari, W. Conus, T. Tsuboi, K. Nagasaka
Details

Manipulating objects whose physical properties are unknown remains one of the greatest challenges in robotics. Controlling grasp force is an essential aspect of handling unknown objects without slipping or crushing them. Although extensive research has been carried out on grasp force control, unknown object manipulation is still difficult because conventional approaches assume that object properties (mass, center of gravity, friction coefficient, etc.) are known for grasp force control. One of the approaches to address this issue is incipient slip detection. However, there has been few detailed investigations of robust detection and control of incipient slip on rotational case. This study makes contributions on deriving the theoretical model of incipient slip and proposes a new algorithm to detect incipient slip. Additionally, a novel sensor configuration and a grasp force control algorithm based on the derived theoretical model are proposed. Finally, the proposed algorithm is evaluated by grasping objects with different weights and moments including a fragile pastry (éclair).

more
ThemeSlow EEG fluctuation reflecting behavioral changes by cognitive load
Academic Conference42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society
Technology CategoryImmersive Experience
NameN. Sazuka, K. Katsumata, Y. Komoriya, T. Ezaki(Sony Corporation) Takeyuki Oba(Nagoya University) and Hideki Ohira(Nagoya University)
Details

We explored changes of behaviors and brain functions accompanying cognitive load by examining task accuracy, reaction time, and time series EEG power of the alpha band during n-back tasks. Dominant variability of reaction time and increase of fluctuation of the alpha power at 0.01 Hz were shown in a high cognitive task (3-back) compared to a low cognitive task (0-back). Furthermore, enhancement of the alpha power fluctuation at the very low frequency related to higher task performance, suggesting ability of the brain to flexibly adjust behaviors to cognitive load by environmental demands.

more
ThemeHaptic Reproduction by Pneumatic Control Method Based on Load-Displacement Profile
Academic Conference33rd ACM Symposium on User Interface Software and Technology
Technology CategoryImmersive Experience
NameH. Suzuki, A. Nishiike, K. Yoshida, M. Sato, Y. Komoriya, T. Ezaki
Details

It is known that the pressure and contact area change contribute to hard and soft perception as a cutaneous sensation. In this study, we propose a novel method of haptic presentation based on the profiles of physical objects by using the load-displacement measurement. We fabricated a pneumatic haptic system with an elastic membrane that enables controlled pressure stimuli. We verified that the proposed method is capable of reproducing various profiles by comparing with physical objects. As a result, we found that our system could well reproduce the load-displacement profiles of a sponge and a button specimen.

more
ThemeSimultaneous Measurement of Mental Sweating Dynamics by Electrodermal Activity and Optical Coherence Tomography
Academic Conference42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society
Technology CategoryImmersive Experience
NameY. Kondo, T. Ishikawa, Y. Komoriya, T. Ezaki(Sony Corporation) Y. Nimura, M. Ohmi (Osaka University)
Details

Electrodermal activity (EDA) is reflecting the activity of the sympathetic nervous system. And Optical coherence tomography (OCT) have revealed sweating dynamics inside the stratum corneum on fingertips. By analyzing the OCT and EDA signals of the fingertip captured simultaneously, it was clarified that the EDA had a latency to detect mental sweating response. As a result, it is hypothesized that EDA has mainly captured the sweating which appears from the stratum corneum as the change of the conductance.

more
ThemeStress state discrimination by multi blood flow information
Academic Conference42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society
Technology CategoryImmersive Experience
NameT. Ikuta, A. Ito, Y. Komoriya, T. Ezaki
Details

Conventional single-channel laser doppler flowmetry has an issue of low ability in discriminating between relaxing and stress state. We develop a multi-channel laser doppler flowmetry and propose a novel method to use the standard deviation of multi-channel signals in order to improve the stress state discriminating ability. The effect of proposed method is evaluated by bio-psychological experiments and the result shows that our method improves the ability.

more
ThemeNovel dry biological electrode comprising PEDOT/PSS supported by carbon particle
Academic Conference42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society
Technology CategoryImmersive Experience
NameR. Sasaki, M. Katsuhara, Y. Komoriya, T. Ezaki
Details

We developed a novel dry biological electrode comprising biocompatible organic conductive polymer, PEDOT/PSS, supported by carbon particle. Applying a thermo plastic elastomer as a base material of the electrode, it can be easily molded into different shapes. It has biologically safe and enables comfortable wearing with soft texture. Moreover, the electrode realizes low skin-electrode contact impedance comparable to that of PEDOT/PSS polymer. We measured electroencephalogram (EEG) using a pin-shaped dry electrode made of the developed material. The Results showed that EEG signal measured by the developed electrode show almost the same quality as that by a conventional wet electrode. In addition, no mechanical damage was observed after repeated use because of the composite structure. Therefore, this biological electrode is suitable for EEG measurement in daily life.

more
ThemeHuman affective-states estimation by a model of meta-level patterns of EEG
Academic Conference7th Annual Society for Affective Science Conference
Technology CategoryImmersive Experience
NameN. Sazuka, Y. Komoriya, T. Ezaki(Sony Corporation) T. Oba(Nagoya University) and H. Ohira(Nagoya University)
Details

We proposed novel feature quantities of electroencephalogram (EEG) to effectively detect affects in humans. A machine learning model using the proposed feature quantities of time series EEG powers showed higher accuracy to estimate affective states of concentration and relaxation compared to a model using conventional EEG powers. Ten healthy human participants conducted a 3-back task with monetary reward to evoke a state of concentration and a 0-back task as relaxation, three times in different days. Their EEG signals from frontal areas were measured during each task period using a wearable devise. We first analyzed EEG powers' time series in theta and alpha frequency bands in shorter segmentations. The theta power was greater, and alpha power was smaller, statistically significant at most electrodes (p < .05), during the concentration task than during the relaxation task, certificated validity of our experimental manipulation to induce concentration and relaxation. We then proposed the novel feature quantities, the 2nd-order time series of EEG power (fluctuation of time series of time series of EEG power), as we found nontrivial fluctuation in time series of EEG powers during both tasks. An accuracy of estimation of two internal states by the machine learning model (Support vector machine) using proposed the 2nd-order EEG powers was outperformed by the model using conventional EEG powers (67.1% to 83.3%). These results suggest that feature quantities reflecting the meta-level pattern of fluctuations of EEG power should be beneficial to estimate affective states in humans.

more
ThemeThermally assisted hydrolysis and methylation GC-MS of rigidly cross-linked acrylate copolymer
Academic Conference23rd edition of the International Conference on Analytical and Applied Pyrolysis
Technology CategoryImaging / Sensing
NameS. Kato, H. Aoi, H. Ohtani(Nagoya Institute of Technology) S. Sakaigawa, T. Umesato, Y. Nishida, Y. Kudo(Sony Corporation)
Details

Thermally assisted hydrolysis and methylation-gas chromatography-mass spectrometry (THM-GC-MS) using tetramethylammonium hydroxide (TMAH) is a useful technique to characterize condensation polymers consisting of ester and carbonate linkages. As for photocured acrylic resins, monomer compositions, degree of conversions and distributions of cross-linking sequences have been successfully analyzes1-3. However, in the case of rigidly cross-linked acrylate copolymer, especially composed of high molecular weight monomer units, the reaction efficiency is often insufficient to make precise and accurate quantification for the copolymer sample by an ordinary THM-GC-MS method. In this work, such intractable cross-linked acrylate copolymer samples were analyzed with modifying the sampling procedure for THM-GC-MS measurements. The photocured acrylate copolymer samples were prepared from a bifunctional acrylate (9,9-bis[4-(2-acryroyloxyethyloxy)phenyl]fluorine; MW 546.2) and a monofunctional acrylate (2-(9H-carbazole-9-yl)ethyl acrylate; MW 265.1). A weighed piece of the cured copolymer sample was put into TMAH solution (25 wt% in methanol) in a microtube and kept under an ambient temperature. After standing for 24 hours, the cured sample was thoroughly dissolved in the TMAH solution. Then an aliquot of the solution was put into a sample cup and subjected to pyrolysis-GC-MS measurement at 400ºC. The characteristic decomposition products for each monomer unit were clearly observed in the chromatograms and the copolymer composition can be precisely estimated from the relative yields of the observed decomposition products. In addition, a peak of methyl acrylate which is formed from unreacted acrylate moieties in the photocured copolymer appeared in the chromatogram and its relative yields were able to be correlated to the relative abundance of unreacted acrylate in the cured sample. The photocured acrylate copolymer samples prepared under various curing conditions were then subjected to the modified THM-GC-MS measurements. The observed results were interpreted in terms of copolymer composition and the degree of conversion of acrylates to clarify the curing process of the photocured acrylate copolymer samples.

more
ThemeElectronic Structure of Photovoltaic Organic Films and Interfaces Investigated by High-Sensitivity Ultraviolet Photoelectron spectroscopy​
Academic ConferenceThe 6th International Conference on Electronic Materials and Nanotechnology for Green Environment
Technology CategoryImaging / Sensing
NameS. Kimata, Y. Tanaka, H. Ishii(Chiba University)
H. Susa,T. Nishi​(Sony Corporation)
Details

To further improve the performance of organic photovoltaic cells, the understanding of photocarrier generation process is necessary, but the information on the electronic structure of photoexcited states and subsequently generated photocarriers has been limited; most experiments to investigate them were performed by optical spectroscopies in which only the energy position relative to the ground state can be determined. The absolute energy position relative to the Fermi level or vacuum level is, however, also important to draw the complete energy diagram of photovoltaic systems. Photoemission spectroscopy has been widely used to determine the absolute energy position of materials, but, the sensitivity is not enough to observe the photocarriers because of their low density. Our group has developed high-sensitivity UV photoemission spectroscopy (HS-UPS) by which very weak density of states can be detected. By using this technique we can expect to observe electrons in weak states such as trap states, exciton states, and photocarrier states in photovoltaic system. In this study, we will report on HS-UPS experiments to directly observe the electronic structure of C60 single film, CuPc single film, and C60/CuPc interface in dark and under illumination of solar simulator in order to try to detect the exciton and carrier states generated by illumination.

more
ThemeQuantitative electric field imaging in GaN-based heterostructures by DPC STEM
Academic ConferenceThe European Microscopy Congress 2020 (Online)
Technology CategoryImaging / Sensing
NameS. Toyama, T. Seki, Y. Ikuhara(University of Tokyo) Y. Kanitani, Y. Kudo, S. Tomiya(Sony Corporation) N. Shibata(Fine Ceramics Center)
Details

Local electromagnetic fields inside specimens can be directly observed in real-space at high spatial resolution by using differential phase contrast (DPC) imaging method in scanning transmission electron microscopy (STEM). In DPC STEM, deflection of electron beam due to electromagnetic fields is detected by a segmented detector placed on the bright field (BF) disk region. Using DPC STEM, some electromagnetic fields observation has been successfully conducted, including p-n junction electric fields and skyrmions. However, as for heterointerfaces with strong distortion such as GaN-based semiconductor devices, intense diffraction contrast affects the quantitative observation of electromagnetic fields by S/TEM. In this study, we attempt to remove diffraction contrast from the electric field image of GaN/AlGaN interface and to obtain quantitative values of polar electric fields and local carrier distribution in the vicinity of the interface. For reducing diffraction contrast in the electric field image, we adopted the tilt averaging method. Bragg lines in BF disks are very sensitive to the tilt condition of the specimen or the electron beam. On the other hand, the beam deflection by electric fields in the specimen should not be affected by a slight difference in the tilt condition. Therefore, by averaging BF disks with various tilt condition, it is expected that diffraction contrast is eliminated by averaging and the signals only by electric fields remain. From the results of the experiments and the simulations, we confirmed the remaining diffraction contrast in the averaged DPC images is sufficiently weak to quantify electric fields in the hetero structures. As a result, polarization fields and local carrier distribution in GaN/AlGaN hetero interfaces were acquired. The details of this method and evaluation of the results will be discussed in the presentation.

more
ThemeDynamics in Photoelectric Conversion in C60/Pentacene System
Academic Conference28th International Colloquium on Scanning Probe Microscopy (ICSPM28)
Technology CategoryImaging / Sensing
NameO. Takeuchi, T. Kogure, Y. Fujimaki, S. Yoshida, H. Shigekawa (University of Tsukuba) S. Nagai, T. Nishi, Y. Kudo(Sony corporation)
Details

Photoelectric conversion in C60/pentacene bilayer was investigated by light-modulated scanning tunneling spectroscopy (LM-STS) and optical pump-probe scanning tunneling microscopy (OPP-STM), to spatially resolve the dynamics in the photoelectric conversion in organic solar cells with singlet fission mechanism. Its active layer consists of C60(30 nm)/pentacene(30 nm) room temperature grown on ITO film on a glass substrate. During imaging a STM topography, either the LM-STS measurement or OPP-STM spectroscopy was done at grid points. For LM-STS, STM bias voltage was swept while the 532 nm green laser exciting the sample was periodically chopped. For OPP-STM, the amplitude of ~1 kHz square-wave-like modulation of the delay time between the two 532 nm laser pulses was swept and the tunnel current was lock-in detected with the modulation. In this study we adopted non-linear sweep of the delay time across more than six orders of magnitude, as shown in the figure, for the first time with OPP-STM. As a result, we distinguished two independent decay components with non-exponential decay behaver. Although the detailed background of the process is still unknown, since the time scale of the process is much longer than the lifetime of the singlet excitons, ~1 ps, we believe that the delay time dependence and spatial distribution of the measured components will give us the microscopic view of the dynamics of the triplet excitons generated by the singlet fission in organic solar cells.

more
ThemeOlfactory training with Aromastics – olfactory and cognitive effects
Academic ConferenceAnnual conference of the European Chemoreception Research Organization
Technology CategoryHealth / Medical
NameA. Oleszkiewicz. L. Bottesi, M. Pieniak, S. Fujita, N. Krasteva, G. Nelles, T. Hummel
Details

Olfactory system can be successfully rehabilitated with a regular, intermittent stimulation during multiple daily exposures to selected set of odors, i.e. olfactory training (OT). Recent advancements in studies on OT suggest that its beneficial effects exceed olfaction and extend to specific cognitive tasks. So far, studies on OT utilized glass bottles or Sniffin’ Sticks which are not very practical for subjects and relatively expensive. The aim of our study was to verify if OT performed with Aromastics, a dedicated odor dispenser from SONY, yields similar effects as OT conveyed with traditional tools. To this end, we examined 65 subjects (33 females; Mage=58.9; SD=10.67 years) of whom 35 suffered from impaired olfactory function. Subjects were randomly assigned to a standard (twice a day) or intense (four times a day) OT. Olfactory and cognitive measurements were taken before and after OT. Results indicate that OT based on Aromastics may be effective in supporting olfactory rehabilitation and interventions targeted to cognitive function.

more
ThemeMaximum Likelihood Channel Decoding with Quantum Annealing Machine
Academic ConferenceInternational Symposium on Information Theory and Its Applications
Technology CategoryAI / Robotics
NameN. Ide, T. Asayama, H. Ueno, M. Ohzek
Details

We formulate maximum likelihood (ML) channel decoding as a quadratic unconstraint binary optimization (QUBO) and simulate the decoding by the current commercial quantum annealing machine, D-Wave 2000Q. We prepared two implementations with Ising model formulations, generated from the generator matrix and the parity-check matrix respectively. We evaluated these implementations of ML decoding for low-density parity-check (LDPC) codes, analyzing the number of spins and connections and comparing the decoding performance with belief propagation (BP) decoding and brute-force ML decoding with classical computers. The results show that these implementations are superior to BP decoding in relatively short length codes, and while the performance in the long length codes deteriorates, the implementation from the parity-check matrix formulation still works up to 1k length with fewer spins and connections than that of the generator matrix formulation due to the sparseness of parity-check matrices of LDPC.

more
ThemeOlfactory training with Aromastics – olfactory and cognitive effects
Academic ConferenceRhinology
Technology CategoryHealth / Medical
NameA. Oleszkiewicz. L. Bottesi, M. Pieniak, S. Fujita, N. Krasteva, G. Nelles, T. Hummel
Details

The olfactory system can be successfully rehabilitated with a regular, intermittent stimulation during multiple daily exposures to selected sets of odors, i.e. olfactory training (OT). OT has been repeatedly shown to be an effective tool of olfactory performance enhancement. Recent advancements in studies on OT suggest that its beneficial effects exceed olfaction and extend to specific cognitive tasks. So far, studies on OT provided compelling evidence for its effectiveness but there is still a need to search for an optimal OT protocol. The aim of our study was to examine whether OT is more successful when performed more often. We examined 65 subjects (33 females; Mage=58.9; SD=10.67 years) 35 of whom exhibited impaired olfactory function. Subjects were randomly assigned to a standard (twice a day) or intense (four times a day) OT. Olfactory and cognitive measurements were taken before and after OT. OT performed twice a day is more effective in supporting olfactory rehabilitation and interventions targeted to verbal semantic fluency than OT performed four times a day. OT regimen may be a factor influencing its effectiveness. OT regimen that is easy to fit in daily routine leads to higher OT compliance that results in significant improvement of olfactory and verbal functions.

more
ThemeSmells influence perceived pleasantness but not memorization of a visual virtual environment
Academic Conferencei-Perception
Technology CategoryHealth / Medical
NameA. Sabiniewicz, E. Schaefer, C. Guducu, C. Manesse, M. Bensafi, N. Krasteva, G. Nelles, T. Hummel
Details

The aim of the present study was to investigate whether the perception of still scenes in a virtual environment (360°-panoramas) in congruent vs incongruent condition can be influenced by odors, with the assumption that congruent scenes would become more pleasant and memorable. Ninety healthy participants were tested under two experimental VR environment conditions: a rose garden, an orange basket, and a control condition (black screen). The subjects provided ratings and descriptions of the content of VR scenes without being exposed again to odors or VR environments. Results showed that virtual scenarios were remembered as more pleasant when presented with congruent 36 odors, participants used more descriptors in congruent scenarios than in incongruent 37 scenarios, rose odor was remembered as more pleasant when presented within congruent scenarios. Taken together, these findings show that olfactory stimuli in congruent vs incongruent conditions can modulate the perception of the pleasantness of visual scenes (at both affective and verbal levels) but not the memorization, enhancing our understanding of multisensory integration processing in virtual environments.

more
ThemeMini-RCM: Origami-Inspired Miniature Manipulator for Microsurgery
Academic ConferenceNature Machine Intelligence
Technology CategoryAI / Robotics
NameH. Suzuki, R. J. Wood
Details

The use of a structure with a remote fixed point around which a mechanism can rotate is called remote centre of motion (RCM). The technique is widely used in minimally invasive surgery to avoid excess force on the incision site during the robot’s motion. Here we describe the design, fabrication and characterization of an origami-inspired miniature RCM manipulator for teleoperated microsurgery (the mini-RCM has mass 2.4 g and size 50 mm × 70 mm × 50 mm), which is actuated by three independently controlled linear actuators with concomitant sensing (each mini-LA has mass 0.41 g and size 28 mm × 7 mm × 3.6 mm). The mini-RCM has a payload capacity of approximately 27 mN and attains a positional precision of 26.4 μm. We demonstrate its potential utility as a precise tool for teleoperated microsurgery by performing 0.5-mm-square tracing and micro-cannulation teleoperated microsurgical procedures under a microscope. Teleoperation using the mini-RCM reduced the deviation from the desired trajectory by 68% compared to manual operation. In addition, the mini-RCM allows gravity compensation and back drivability for safety. Its compact, simple structure facilitates manufacture.

more
ThemeImpact of Reorientation Barrier on Orientation of Organic Molecules during Film Growth
Academic ConferencePhysical Review Materials
Technology CategoryImaging / Sensing
NameS. Nagai, Y. Inaba, T. Nishi, H. Kobayashi, S. Tomiya
Details

We used p-polarized multiple-angle incidence resolution spectrometry (pMAIRS) to investigate a collective orientation barrier (COB) in the growth of organic semiconductor (OSC) films. We demonstrate a temperaturedependent variation in the growth of pentacene (PEN) on SiO2 films as a model system. The molecular orientation varied from lying to standing as the growth temperature increased. This change suggests that the formation of a standing orientation is thermally activated compared with the lying state. The nucleation of standing-oriented islands occurs by molecular self-assembly at sufficiently high temperatures. Conversely, molecules deposit in a flat-lying state at low temperatures owing to hindrance of the kinetic barrier to reorientation from lying to standing states. The COB is defined as a collective energy barrier for the reorientation processes throughout the growth. The COB is quantitatively estimated from an Arrhenius plot of the probability to form a standing orientation, derived from the dichroic ratio measured by pMAIRS. We found that the COB in the growth of PEN on the SiO2 system was approximately 0.02 eV, which is a key parameter for determining the molecular orientation. A quantitative evaluation of the COB will be applicable to other systems and enable effective control over the molecular orientation of OSC films.

more
ThemeWannier-like delocalized exciton generation in C60 fullerene clusters simulated using large-scale time dependent density functional theory
Academic ConferenceThe Journal of Physical Chemistry C
Technology CategoryImaging / Sensing
NameH. Kobayashi, S. Hattori, R. Shirasawa, S. Tomiya
Details

C60 fullerene is widely used in organic devices; however, there are still unaccountable phenomena, such as the strong light absorption around 2.8 eV of C60 films and favorable device performance of organic photovoltaics (OPVs) using C60-rich bulk heterojunctions. We studied the excited states of C60 fullerene clusters using large-scale time-dependent density functional theory taking thermal vibrations into account. A strong absorption peak around 2.8 eV appeared because of aggregation and we found that Wannier-like delocalized excitons spread over multiple molecules were generated in this energy region. This is contrary to the accepted theory that only Frenkel or charge transfer (CT) excitons are generated in organic materials. It is considered that the delocalized excitons with energies greater than electrical gap (Eg) become CT excitons after thermalization whereas those with energies lower than Eg become either CT excitons by obtaining thermal energy or Frenkel excitons after thermalization. In the exciton dissociation process, there is an essential channel in which the delocalized excitons are generated first and then subsequently become CT excitons. The delocalized exciton generation can explain both the strong absorption around 2.8 eV and enhanced OPV performance using C60-rich bulk heterojunctions.

more
ThemeEffect of Dielectric Fabrication Techniques on Graphene Gating
Academic ConferenceThe Journal of Physical Chemistry C
Technology CategoryImaging / Sensing
NameT. Yu, Vasyl G. Kravets, S. Imaizumi, A. N. Grigorenko
Details

One of the exciting features of graphene is a possibility to affect its electrical, optical, and chemical properties by gating, that is, by application of an electric field. This requires reasonably large fields (at the level of 1 V/nm necessary to induce relevant electron density changes) applied over a gating dielectric material. At these fields, most dielectrics show some conduction, which leads to an important question: what is the best dielectric to gate graphene? Here, we show that this question is imprecise as a dielectric material produced by different fabrication methods can exhibit dramatically different gating properties. Namely, we show that two oxide dielectrics (hafnia and alumina) result in positive hysteresis of graphene gating characteristics being fabricated by atomic layer deposition and negative hysteresis being fabricated by electron beam evaporation. We attribute this behavior to the stoichiometry of the samples and oxygen ion migration. It implies that oxide dielectrics should be avoided in graphene gated devices working at room temperatures.

more
ThemeThe Prospects of Sensory Transmission and UX in the 5G Era
Academic ConferenceNew Breeze 2020 Autumn
Technology Category5G / IoT
NameY. Ohgishi, K. Yamamoto

more
ThemeMEMS Gyro Array Employing Array Signal Processing for Interference and Outlier Suppression
Academic Conference2020 IEEE International Symposium on Inertial Sensors and Systems (INERTIAL)
Technology CategoryImaging / Sensing
NameH. Kamata, M. Kimishima, T. Sawada, Y. Suga, H. Takeda, K. Yamashita (Sony Corporation)
S. Mitani (Japan Aerospace Exploration Agency)
Details

In this paper, two practical problems that cannot be ignored when forming compactly integrated MEMS IMU array at low cost using Coriolis vibration type MEMS with low power consumption - mutual vibration interference noise caused by MEMS proximity arrangement, and outlier noise caused by random telegraph signal noise - are firstly discussed. Then, we propose an easily implementable array signal processing filtering on a small FPGA that recovers gyroscope performance (angle random walk and bias instability) by suppressing the interference noise and removing the outlier noise dynamically. We fabricated 32 consumer MEMS IMU array board without IMU screening process, implemented the proposed filter, and evaluated its performance. As a result, although there are mutual interferences and a MEMS gyro sensor with poor performance, gyroscope performance (0.7mdeg/s/√Hz as angle random walk and 0.5deg/h as bias instability) which are close to the ideal array gain of 1/√32 were successfully achieved.

more
ThemeRed-Enhanced Laser-Phosphor Light Source with Quantum Dot Conversion Layer
Academic ConferenceSID Display Week
Technology CategoryAudio / Visual
NameT. Kaji, H. Morita, C. Wall, D. Chercka, N. Krasteva, I .Kobayashi
Details

We developed a red-enhanced laser-phosphor light source with quantum dot conversion layer for projector applications. This device uses dual-layer wheel structure, in which a quantum dot layer is underneath a Ce:YAG phosphor layer. We demonstrated a high brightness and wide color gamut QD-based light source.

more
ThemeData-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service
Academic ConferenceRecommender Systems 2020 Workshop REVEAL
Technology CategoryData Analytics / Cloud Computing
NameY. Sato (CFMLab.),
T. Udagawa, K. Tateno (Sony corp.)
Details

Off-policy evaluation (OPE) is the method that attempts to estimate the performance of decision making policies using historical data generated by different policies without conducting costly online A/B tests. Many OPE methods with theoretical backgrounds have been proposed, including Direct Method (DM), Inverse Probability Weighting (IPW), and Doubly Robust (DR).
One emerging challenge with this trend is that a suitable estimator can be different for each application setting. For example, DM has low variance but has a large bias, and thus, performs better in small sample settings. On the other hand, IPW has a low bias but has a large variance, and thus reveals better performance in large sample settings. It is often unknown for practitioners which estimator to use for their specific applications and purposes. To find out a suitable estimator among many candidates, we propose a data-driven estimator selection procedure for off-policy policy performance estimators as a practical solution.
As a proof of concept, we use our procedure to select the best estimator to evaluate coupon treatment policies on a real-world online content delivery service.

more
Theme60 GHz Multipath Propagation Analysis and Inference for an Indoor Scenario
Academic ConferenceIEEE Vehicular Technology Conference FALL
Technology CategoryProfessional Solutions
NameS. Balakrishnan, L. Xin, M. Abouelseoud, K. Sakoda, K. Tanaka(Sony Corporation), S. Hashim, A. Shah, C. Slezak(New York University)
Details

Channel measurements at millimeter wave (mmWave) frequencies are typically carried out using directional antennas to overcome high path loss at mmWave frequencies. Unlike traditional directional channel measurements using narrow beam horn antennas where ray paths can be uniquely mapped to a direction with sufficient high accuracy, phased arrays with irregular beam pattern offers additional complexity in determining the arrival statistics of ray paths. In this work, we propose a systematic method to infer the ray path characteristics with the knowledge of beam patterns used by the phased array. We leverage on the channel measurement data obtained from extensive 60 GHz measurement campaign performed for indoor living room scenario and extract the ray path information from the measured data with high reliability. We also verify our findings through ray tracing simulation.

more
ThemeIMMERSIVE MEDIA: STANDARDS AND CHALLENGES
Academic Conference2020 International Conference on 3D Immersion
Technology CategoryAudio / Visual
NameD. Graziosi
Details

Immersive Media denotes media formats that provide experiences to the end user that is closer to reality. In order to achieve the “immersion” factor, a possible solution needs to consider several aspects, including coding and systems level aspects. For more than 30 years, MPEG has been developing widely deployed, cross-industry standards. For immersive media, MPEG has created a project called MPEG-I, which has an immersive media solution with several parts, including system architectures, coding and quality evaluation. In this paper, we will briefly explain how the project is structured, describe some of the recently published international standards, and discuss emerging new challenges and opportunities.

more
ThemePoint Cloud Compression in MPEG
Academic Conference2020 IEEE International Conference on Image Processing (ICIP)
Technology CategoryAudio / Visual
NameM. Preda, D. Graziosi, O. Nakagami, K. Mammou
Details

Consumer and industry level 3D sensing devices are becoming more common than ever before, increasing the amount of available 3D data. 3D scans can capture the full geometry and details of a 3D scene, and are useful in many applications including virtual reality, 3D video, robotics and geographic information access. Among many representation formats for 3D data, point clouds are a tradeoff between the easiness of acquisition, realistic rendering, facility in manipulation and processing. However, point clouds are typically represented by extremely large amounts of data, which is a significant barrier for mass market applications. To address this challenge, the Moving Pictures Experts Group (MPEG) initiated a standardization activity on Point Cloud Compression (PCC).

This tutorial introduces the technologies developed during the MPEG standardization process for defining an international standard for point cloud compression. The diversity of point clouds in terms of density conducted to the design of two approaches: the first one, called V-PCC (Video based Point Cloud Compression) consists in projecting the 3D space into a set of 2D patches and encodes them by using traditional video technologies. The second one, called G-PCC (Geometry based Point Cloud Compression) is traversing directly the 3D space in order to create the predictors.

With the current V-PCC encoder implementation providing a compression of 125:1, a dynamic point cloud of 1 million points could be encoded at 8 Mbit/s with good perceptual quality. For the second approach, the current implementation of a lossless, intra-frame G PCC encoder provides a compression ratio up to 10:1 and acceptable quality lossy coding of ratio up to 35:1.

By providing high-level immersiveness at currently available bandwidths, the two MPEG standards are expected to enable several applications such as six Degrees of Freedom (6 DoF) immersive media, virtual reality (VR) / augmented reality (AR), immersive real-time communication, autonomous driving, cultural heritage, and a mix of individual point cloud objects with background 2D/360-degree video.

more
ThemeVideo-Based Coding Of Volumetric Data
Academic Conference2020 IEEE International Conference on Image Processing (ICIP)
Technology CategoryAudio / Visual
NameD. Graziosi, B. Kroon
Details

New standards are emerging for the coding of volumetric 3D data such as immersive video and point clouds. Some of these volumetric encoders similarly utilize video codecs as the core of their compression approach, but apply different techniques to convert volumetric 3D data into 2D content for subsequent 2D video compression. Currently in MPEG there are two activities that follow this paradigm: ISO/IEC 23090-5 Video-based Point Cloud Compression (V-PCC) and ISO/IEC 23090-12 MPEG Immersive Video (MIV). In this article we propose for both standards to define 2D projection as common transmission format. We then describe a procedure based on camera projections that is applicable to both standards to convert 3D information into 2D images for efficient 2D compression. Results show that our approach successfully encodes both point clouds and immersive video content with the same performance as the current test models that MPEG experts developed separately for the respective standards. We conclude the article by discussing further integration steps and future directions.

more
ThemeV-PCC Component Synchronization for Point Cloud Reconstruction
Academic Conference2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
Technology CategoryAudio / Visual
NameD. Graziosi, A. Tabatabai, V. Zakharchenko, A. Zaghetto
Details

For a V-PCC 1 system to be able to reconstruct a single instance of the point cloud one V-PCC unit must be transferred to the 3D point cloud reconstruction module. It is however required that all the V-PCC components i.e. occupancy map, geometry, atlas and attribute to be temporally aligned. This, in principle, could pose a challenge since the temporal structures of the decoded sub-bitstreams are not coherent across V-PCC sub-bitstreams. In this paper we propose an output delay adjustment mechanism for the decoded V-PCC sub-bitstreams to provide synchronized V-PCC components input to the point cloud reconstruction module.

more
ThemeAn overview of ongoing point cloud compression standardization activities: video-based (V-PCC) and geometry-based (G-PCC)
Academic ConferenceAsia Pacific Signal and Information Processing Association
Technology CategoryAudio / Visual
NameD. Graziosi, O. Nakagami, S. Kuma, Z. Alexandre, A. Tabatabali, T. Suzuki
Details

This article presents an overview of the recent standardization activities for Point Cloud Compression (PCC). A point cloud is a 3D data representation used in diverse applications associated with immersive media including Virtual/Augmented Reality (VR/AR), Immersive Telepresence, Autonomous Driving and Cultural Heritage Archival. The international standard body for media compression, also known as the Motion Picture Experts Group (MPEG), is planning to release in 2020 two PCC standard specifications: Video-based PCC (V-CC) and Geometry-based PCC (G-PCC). V-PCC and G-PCC will be part of the ISO/IEC 23090 series on coded representation of immersive media content. In this paper, we provide a detailed description on both codec algorithms and their coding performances. Moreover, we will also discuss certain unique aspects of point cloud compression.

more
ThemeSkin-based Identification from Multispectral image data using CNNs
Academic ConferenceMeeting on Image Recognition and Understanding
Technology CategoryImaging / Sensing
NameT. Uemori, A. Ito, Y. Moriuchi, A. Gatto, J. Murayama
Details

User identification from hand images only is still a chal- lenging task. In this paper, we propose a new biometric identification system based solely on a skin patch from a multispectral image. The system is utilizing a novel modi- fied 3D CNN architecture which is taking advantage of mul- tispectral data. We demonstrate the application of our sys- tem for the example of human identification from multispec- tral images of hands. To the best of our knowledge, this paper is the first to describe a pose-invariant and robust to overlapping real-time human identification system using hands. Additionally, we provide a framework to optimize the required spectral bands for the given spatial resolution limitations.

more
ThemeJoint graph-based depth refinement and normal estimation
Academic ConferenceConference on Computer Vision and Pattern Recognition
Technology CategoryAudio / Visual
NameM. Rossi, ME. Gheche, A. Kuhn, P. Frossard
Details

Depth estimation is an essential component in understanding the 3D geometry of a scene, with numerous applications in urban and indoor settings. These scenes are characterized by a prevalence of human made structures, which in most of the cases, are either inherently piece-wise planar, or can be approximated as such. In these settings, we devise a novel depth refinement framework that aims at recovering the underlying piece-wise planarity of the inverse depth map. We formulate this task as an optimization problem involving a data fidelity term that minimizes the distance to the input inverse depth map, as well as a regularization that enforces a piece-wise planar solution. As for the regularization term, we model the inverse depth map as a weighted graph between pixels. The proposed regularization is designed to estimate a plane automatically at each pixel, without any need for an a priori estimation of the scene planes, and at the same time it encourages similar pixels to be assigned to the same plane. The resulting optimization problem is efficiently solved with ADAM algorithm. Experiments show that our method leads to a significant improvement in depth refinement, both visually and numerically, with respect to state-of-the-art algorithms on Middlebury, KITTI and ETH3D multi-view stereo datasets.

more
ThemeBP-MVSNet: Belief-Propagation-Layers for Multi-View-Stereo
Academic ConferenceInternational Conference on 3D Vision
Technology CategoryAudio / Visual
NameC. Sormann, P. Knöbelreiter, A. Kuhn, M. Rossi, T. Pock, F. Fraundorfer
Details

In this work, we propose BP-MVSNet, a convolutional neural network (CNN)-based Multi-View-Stereo (MVS) method that uses a differentiable Conditional Random Field (CRF) layer for regularization. To this end, we propose to extend the BP layer and add what is necessary to successfully use it in the MVS setting. We therefore show how we can calculate a normalization based on the expected 3D error, which we can then use to normalize the label jumps in the CRF. This is required to make the BP layer invariant to different scales in the MVS setting. In order to also enable fractional label jumps, we propose a differentiable interpolation step, which we embed into the computation of the pairwise term. These extensions allow us to integrate the BP layer into a multi-scale MVS network, where we continuously improve a rough initial estimate until we get high quality depth maps as a result. We evaluate the proposed BP-MVSNet in an ablation study and conduct extensive experiments on the DTU, Tanks and Temples and ETH3D data sets. The experiments show that we can significantly outperform the baseline and achieve state-of-the-art results.

more
ThemeDeep-MVS: Deep Confidence Prediction for Multi-View Stereo Reconstruction
Academic ConferenceInternational Conference on 3D Vision
Technology CategoryAudio / Visual
NameA. Kuhn, C. Sormann, M. Rossi, F. Fraundorfer, O. Erdler
Details

Deep Neural Networks (DNNs) have the potential to improve the quality of image-based 3D reconstructions. However, the use of DNNs in the context of 3D reconstruction from large and high-resolution image datasets is still an open challenge, due to memory and computational constraints. We propose a pipeline which takes advantage of DNNs to improve the quality of 3D reconstructions while being able to handle large and high-resolution datasets. In particular, we propose a confidence prediction network explicitly tailored for Multi-View Stereo (MVS) and we use it for both depth map outlier filtering and depth map refinement within our pipeline, in order to improve the quality of the final 3D reconstructions. We train our confidence prediction network on (semi-)dense ground truth depth maps from publicly available real world MVS datasets. With extensive experiments on popular benchmarks, we show that our overall pipeline can produce state-of-the-art 3D reconstructions, both qualitatively and quantitatively.

more
ThemeSemi-supervised Deep Learning Techniques for Spectrum Reconstruction
Academic ConferenceInternation Conference on Pattern Recognition
Technology CategoryImaging / Sensing
NameA. Simonetto, V. Parret, P. Sartor, P. Zanuttigh, A. Gatto
Details

State-of-the-art approaches for the estimation of hyperspectral images (HSI) from RGB data are mostly based on deep learning techniques but due to the lack of training data their performances are limited to uncommon scenarios where a large hyperspectral database is available. In this work we present a family of novel deep learning schemes for hyperspectral data estimation able to work when the hyperspectral information at our disposal is limited. Firstly, we introduce a learning scheme exploiting a physical model based on the backward mapping to the RGB space and total variation regularization that can be trained with a limited amount of HSI images. Then, we propose a novel semi-supervised learning scheme able to work even with just a few pixels labeled with hyperspectral information. Finally, we show that the approach can be extended to a transfer learning scenario. The proposed techniques allow to reach impressive performances while requiring only some HSI images or just a few pixels for the training.

more
ThemeThe Upside of being a Digital Pharma Player
Academic ConferenceDrug Discovery Today
Technology CategoryAI / Robotics
NameA. Schuhmacher, A. Gatto, M. Hinder, M. Kuss, O. Gassmann
Details

We investigated the state of artificial intelligence (AI) in pharmaceutical research and development (R&D) and outline here a risk and reward perspective regarding digital R&D. Given the novelty of the research area, a combined qualitative and quantitative research method was chosen, including the analysis of annual company reports, investor relations information, patent applications, and scientific publications of 21 pharmaceutical companies for the years 2014 to 2019. As a result, we can confirm that the industry is in an ‘early mature’ phase of using AI in R&D. Furthermore, we can demonstrate that, despite the efforts that need to be managed, recent developments in the industry indicate that it is worthwhile to invest to become a ‘digital pharma player’.

more
ThemeIteratively Training Look-Up Tables for Network Quantization
Academic ConferenceIEEE Journal of Selected Topics in Signal Processing
Technology CategoryAI / Robotics
NameF. Cardinaux, S. Uhlich, K. Yoshiyama, J. Alonso Garcia, L. Mauch, S. Tiedemann, T. Kemp, A. Nakamura
Details

Operating deep neural networks (DNNs) on devices with limited resources requires the reduction of their memory as well as computational footprint.
Popular reduction methods are network quantization or pruning, which either reduce the word length of the network parameters or remove weights from the network if they are not needed. In this article we discuss a general framework for network reduction which we call Look-Up Table Quantization (LUT-Q). For each layer, we learn a value dictionary and an assignment matrix to represent the network weights. We propose a special solver which combines gradient descent, and a one-step k-means update to learn both the value dictionaries and assignment matrices iteratively. This method is very flexible: by constraining the value dictionary, many different reduction problems such as non-uniform network quantization, training of multiplierless networks, network pruning or simultaneous quantization and pruning can be implemented without changing the solver. This flexibility of the LUT-Q method allows us to use the same method to train networks for different hardware capabilities.

more
ThemeDiscrete Exclusion Zone for Dynamic Spectrum Access Wireless Networks
Academic ConferenceIEEE ACCESS
Technology Category5G / IoT
NameC. Sun, R. Jiao
Details

The implementation of a geographic exclusion zone (GEZ) has been a scheme in regulations developed to protect a primary user (PU) in dynamic spectrum access wireless networks, where secondary users (SUs) can transmit only outside the exclusion zone region centered at the PU receiver. After determining the radius of the GEZ, the number of operable nodes in actual deployment is quite uncertain due to the random location of nodes. This poses certain difficulty for SU spectrum sharing planning. In this paper, we propose an alternative PU protection scheme called the discrete exclusion zone (DEZ), which is shapeless. The PU protection is achieved by switching off the first k −1 nearest neighboring SUs surrounding the PU. Building on the stochastic geometry of wireless node locations, the conditions under which the mean and the variance of the aggregate interference from SUs to the PU exist are obtained. These conditions define the minimum size of the DEZ. Then, we obtain the closed-form expressions for the mean and the variance as a function of the DEZ size k for a given number of SUs N, including N → ∞. Since it is challenging to obtain a closed-form expression of the density function, we resort to the Gamma distribution to approximate the distribution of the aggregate interference, which is validated by simulations. Finally, the performances of the GEZ and DEZ are investigated in terms of the number of operable SUs outside the GEZ and DEZ, respectively, for achieving a given PU protection requirement. The results show that the DEZ gives a fixed number of operable nodes in the presence of topology randomness associated with the actual SU network deployment.

more
ThemeRoom-temperature continuous-wave operation of green vertical-cavity surface-emitting lasers with a curved mirror fabricated on {20−21} semi-polar GaN
Academic ConferenceApplied Physics Express
Technology CategoryImaging / Sensing
NameT. Hamaguchi, Y. Hoshina, K. Hayashi, M. Tanaka, M. Ito, M. Ohara, T. Jyoukawa, N. Kobayashi, H. Watanabe, M. Yokozeki, R. Koda, K. Yanashima
Details

We demonstrate a room-temperature continuous-wave operation of green vertical-cavity surface-emitting laser (VCSEL) with a 20 μm long cavity possessing a dielectric curved mirror formed over a {20−21} semi-polar gallium nitride substrate. The emission wavelength and the threshold current were 515 nm and 1.8 mA, respectively. We also confirmed that white light is generated by overlaying three prime colors of light, i.e. red, blue and green, emitted only from VCSEL.

more
ThemeEstimation and prediction of ellipsoidal molecular shapes in organic crystals based on ellipsoid packing
Academic ConferencePLoS ONE
Technology CategoryData Analytics / Cloud Computing
NameD. Ito, R. Shirasawa, Y. Iino, S. Tomiya, G. Tanaka
Details

Crystal structure prediction has been one of the fundamental and challenging problems in materials science. It is computationally exhaustive to identify molecular conformations and arrangements in organic molecular crystals due to complexity in intra- and inter-molecular interactions. From a geometrical viewpoint, specific types of organic crystal structures can be characterized by ellipsoid packing. In particular, we focus on aromatic systems which are important for organic semiconductor materials. In this study, we aim to estimate the ellipsoidal molecular shapes of such crystals and predict them from single molecular descriptors. First, we identify the molecular crystals with molecular centroid arrangements that correspond to affine transformations of four basic cubic lattices, through topological analysis of the dataset of crystalline polycyclic aromatic molecules. The novelty of our method is that the topological data analysis is applied to arrangements of molecular centroids intead of those of atoms. For each of the identified crystals, we estimate the intracrystalline molecular shape based on the ellipsoid packing assumption. Then, we show that the ellipsoidal shape can be predicted from single molecular descriptors using a machine learning method. The results suggest that topological characterization of molecular arrangements is useful for structure prediction of organic semiconductor materials.

more
ThemeEvaluation of Machine Learning Techniques for Hand Pose Estimation on Handheld Device with Proximity Sensor
Academic ConferenceACM CHI Conference on Human Factors in Computing Systems (CHI)
Technology CategoryImmersive Experience
NameK. Arimatsu, H. Mori (Sony Interactive Entertainment)
Details

Tracking finger movement for natural interaction using hand is commonly studied. For vision-based implementations of finger tracking in virtual reality (VR) application, finger movement is occluded by a handheld device which is necessary for auxiliary input, thus tracking finger movement using cameras is still challenging. Finger tracking controllers using capacitive proximity sensors on the surface are starting to appear. However, research on estimating articulated hand pose from curved capacitance sensing electrodes is still immature. Therefore, we built a prototype with 62 electrodes and recorded training datasets using an optical tracking system. We have introduced 2.5D representation to apply convolutional neural network methods on a capacitive image of the curved surface, and two types of network architectures based on recent achievements in the computer vision field were evaluated with our dataset. We also implemented real-time interactive applications using the prototype and demonstrated the possibility of intuitive interaction using fingers in VR applications.

more
ThemeHighly selective atomic layer etching for semiconductor application
Academic Conference7th International Atomic Layer Etching Workshop
Technology CategoryImaging / Sensing
NameA. Hirata(Sony Semiconductor Solutions Corporation)
Details

The self-limiting process is one of the most important features of atomic layer etching (ALE). The self-limiting process refers to the highly selective etching of a modified layer over a pristine substrate. One ALE cycle consists of a surface modification step and a removal step of the modified layer. In the modification step, the binding energy in the surface reactive layer is reduced so that it is easier to remove than the bulk. For the generation of a reactive layer, chemical adsorption and chemical/physical modification are generally employed. In this study, we investigate tin-doped indium oxide (ITO) and SiN ALE, and their mechanisms, to achieve the high selectivity.
ITO is a difficult-to-etch material, since the boiling points of indium halides are very high (>700 °C). Surface modification through chemical adsorption of reactive species is difficult. Thus, surface modification by energetic hydrogen ions followed by Ar desorption was proposed. The ITO was reduced by hydrogen injection, and generated an In-rich layer on the surface. The In-rich layer of ITO could be selectively etched by controlling the incident ion energy. Thus, the self-limited etching of ITO was demonstrated.
The etch rate selectivity of ITO over a mask material is indispensable for device fabrication. We intentionally controlled the amount/incubation time of Si generated from the upper electrode, and demonstrated the highly selective cyclic etching of ITO/SiO2. The cyclic etching by area-selective surface adsorption of Si could precisely control the etch rates of ITO and SiO2, which resulted in an almost infinite selectivity for ITO over SiO2.
In the case of SiN ALE, the chemical adsorption of a reactive species (CHxFy polymer) was employed to obtain high selectivity with SiO2 and Si. However, the SiN ALE was easily etch-stopped, owing to the excess adsorption of polymer during cyclic etching. Thus, a sequential 3-step ALE (adsorption, desorption, and O2 ash) was proposed. After this 3-step ALE, the SiN surface was oxidized, which resulted in a fluctuation of the etched amount. To overcome these issues, plasma-enhanced conversion ALE was proposed. First, 3-step ALE was performed for SiN ALE, and the surface SiO2 (converted from SiN by oxidation) was generated. Subsequently, highly selective SiO2 ALE over SiN was performed. By combining highly selective SiN and SiO2 ALE, a stable ALE process was realized.
When we use the differences in precursor incubation time among different materials effectively, highly selective etching is expected. Thus, a database of the surface adsorption of many kinds of precursors is strongly required for future highly selective ALE processes.

more
ThemeSoC compatible 1T1C FeRAM memory array based on ferroelectric Hf0.5Zr0.5O2
Academic ConferenceSymposia on VLSI Technology
Technology CategoryImaging / Sensing
NameJ. Okuno, T. Kunihiro, K. Konishi, H. Maemura, Y. Shuto, F. Sugaya(Sony Semiconductor Solutions Corporation)
Details

This paper experimentally demonstrates fundamental memory array operation of a ferroelectric HfO2-based 1T1C FeRAM. Metal/ferroelectric/metal (MFM) capacitors consisting of a TiN/ Hf0.5Zr0.5O2(HZO)/TiN stack were optimized for a sub 500 °C process. Structures revealed excellent performance such as remanent polarization 2Pr > 40 µC/cm2, endurance > 1011 cycles, and 10 years data retention at 85 °C. Furthermore, the MFM capacitors were successfully integrated into a 64 kbit 1T1C FeRAM array including our dedicated circuit for array operation. Back-end-of-line (BEOL) wiring showed no degradation of the underlying CMOS logic. Program and read operation were properly controlled resulting in 100 % bit functionality at an operation voltage of 2.5 V and operating speed at 14 ns. This technology matches requirements of last level cash (LLC) and embedded non-volatile-memory (NVM) in low power System-on-a-Chip (SoC) for IoT applications.

more
ThemeHigh-Density and Large-Scale MEA System Featuring 236,880 Electrodes at 11.72μm Pitch for Neuronal Network Analysis
Academic ConferenceSymposia on VLSI Technology and Circuits
Technology CategoryImaging / Sensing
NameY. Kato, Y. Matoba, K. Honda, K. Ogawa, K. Shimizu, M. Maehara, C. Yamane, N. Kimizuka, A. Fujiwara, J. Ogi, T. Taura, Y. Oike (Sony Semiconductor Solutions Corporation),
A. Odawara, I. Suzuki (Tohoku Institute of Technology, Japan)
Details

Microelectrode arrays (MEAs) allow us to observe electrical activities from neurons at multiple sites. This paper presents a high-density microelectrode array (HD-MEA) for observing neuronal networks at a cellular level, featuring 236,880 electrodes at an 11.72 μm pitch and 33,840 readout channels with a noise level of 5.5 μVrms. The Peltier cooling system is integrated to maintain the temperature of the electrodes at approximately 37 °C. Moreover, electrical signals for the axonal propagation of rat neurons are successfully recorded. (keywords: CMOS MEA

more
ThemeSmart Vision Sensor
Academic ConferenceVLSI Symposium 2020 Friday Forum
Technology CategoryImaging / Sensing
NameH.Wakabayashi (Sony Semiconductor Solutions Corporation)
Details

Today, the performance of image sensors exceeds the capabilities of the human eye and can provide more immersive experiences as a result. In addition, the image sensors for sensing can digitize various other kinds of information than typical 2 dimensional images where applications such as authentication, recognition, autonomous machine control, and wireless products further process this useful and efficient information. These latest sensors enhance the quality of services and the evolution of image sensing systems grow into a broader information conversion tool.
In this talk, the requirements for smart vision sensors in high-speed and event-based processing systems for industrial automation and always-on sensing applications will be discussed. The performance requirements for these new applications will be demonstrated. Finally, our vision of how the upcoming combination of Sensor and Artificial Intelligence technologies will profoundly influence our lifestyle, will be introduced.

more
ThemeA Back Illuminated 10um SPAD Pixel array comprising Full trench isolation and Cu-Cu bonding with over 14% PDE at 940nm
Academic ConferenceInternational Electron Devices Meeting (IEDM 2020)
Technology CategoryImaging / Sensing
NameK. Ito, Y. Otake, Y. Kitano, A. Matsumoto, J. Yamamoto, T. Ogasahara, H. Hiyama, R. Naito, K. Takeuchi, T. Tada, K. Takabayashi, H. Nakayama, K. Tatani, T. Hirano, T. Wakano (Sony Semiconductor Solutions Corporation)
Details

We developed a BI 10μm SPAD array sensor using pixel-level Cu-Cu bondin g and metal-buried Full Trench Isolation.
Using a 7um thick Si layer, a fine-tuned potential and process, over 14% PDE at λ=940nm and the best in class DCR were achieved. Low timing jitter and suppressed X-talk were also demonstrated.

more
ThemeLow power consumptionandhighresolution1280X960Gate Assisted Photonic Demodulatorpixel for indirect Time of flight
Academic ConferenceInternational Electron Devices Meeting 2020 (IEDM 2020)
Technology CategoryImaging / Sensing
NameY. Ebiko, H. Yamagishi, K. Tatani, H. Iwamoto, Y. Moriyama, Y. Hagiwara, S. Maeda, T. Murase, T. Suwa, H. Arai, Y. Isogai S. Hida, S. Kameda, T. Terada, K. Koiso (Sony Semiconductor Solutions Corporation) F. T Brady, S. Han, A. Basavalingappa (Sony Electronics Inc. Image Sensor Design Center, Rochester, New York, United states of America) T. Michiel, T. Ueno(Sony Depth Sensing inc, Brussel, Belgium)
Details

A 1280×960 floating diffusion storage global shutter image sensor is implemented in a 3D stacked back illuminated indirect time of flight sensor (iToF). The sensor, achieves 18,000e- full well capacity and 32% quantum efficiency (QE) with a pyramid surface for diffraction (PSD) structure [1], utilizing a 3.5μm pixel. Low power consumption is also achieved, due to low leakage current for the iToF pixel and low resistance Cu-Cu connection metal wiring. These device architectures enable high resolution and wide dynamic range 3D depth sensing for both near and far objects.

more
ThemeImaging Devices and Systems for Future Society
Academic ConferenceInternational Electron Devices Meeting 2020 (IEDM 2020)
Technology CategoryImaging / Sensing
NameY. Oike (Sony Semiconductor Solutions Corporation)
Details

The evolution of image sensors and the prospects utilizing advanced imaging technologies promise to improve our quality of life. Since CMOS image sensors have surpassed CCDs with the advent of column‐parallel ADCs and back‐illuminated technology, the image sensor application is expanding to mobile devices, wearables, medical solutions, security networks, factory automation and autonomous driving. Stacking technologies are now drastically accelerating the performance improvement and enhancing the functionality of imaging devices. The fine pitch connection between the pixel and logic layers makes the pixel parallel circuit architecture available for the next evolution. New materials for photoconductive layer extend the sensitivity to a wide range of wavelengths. This tutorial introduces a broad overview of the key device technologies for image sensors, as well as circuit techniques, image signal processing and performance characteristics, that enable imaging applications in various fields. The next challenge of imaging system will be discussed for future society, where the imaging devices integrate edge computing functions and expand the sensing capability of spatial depth, temporal dynamics and invisible light.

more
ThemeStudy of Lower Voltage Protection against Plasma Process Induced Damage with Quantitative Prediction Technique
Academic ConferenceInternational Reliability Physics Symposium
Technology CategoryImaging / Sensing
NameY. Hiura, S. Miyake, S. Mori, K. Matsumoto, H. Ohnuma (Sony Semiconductor Solutions Corporation)
Details

A simple method for quantitative prediction of Vth shift due to plasma process induced charging damage considering protection device effect is proposed. Based on this prediction, the gate oxide of transistor may be stressed by high voltage during plasma process even with efficient protection device such as gated diode, and then enormous Vth shift can be occurred. In this paper, we propose a new protection device which works at extremely lower voltage with higher current.

more
ThemeIGZO based compute cell for analog in-memory computing - DTCO analysis to enable ultra-low power AI at edge
Academic ConferenceEuropean Solid-State Device Research Conference
Technology CategoryImaging / Sensing
NameD. Saito (Sony Semiconductor Solutions Corporation), J. Doevenspeck, S. Cosemans, H. Oh, M. Perumkunnil, I.A. Papistas, A. Belmonte, N. Rassoul, R. Delhougne, G. Kar, P. Debacker, A. Mallik, D. Verkest, M.H. Na (imec)
Details

We propose, for the first time, an IGZO based 2T1C compute cell (IGZO-cell) for analog in-memory computing. To assess the impact of IGZO-cell with the periphery on power and accuracy, a PyTorch framework was developed to analytically modeled analog components. Results are reported for a ResNet20 network on the CIFAR-10 benchmark. State of the art energy efficiency of 15 POPS/W including the periphery is achieved by using our proposed IGZO-cell with CMOS compatibility. Finally, it is shown that, with properly trained model, there is no degradation of test accuracy with 10% device to device variability at IGZO devices.

more
ThemeLensless Imaging with Focusing Sparse URA Masks in Long-Wave Infrared and its Application for Human Detection
Academic ConferenceIEEE European Conference on Computer Vision 2020
Technology CategoryImaging / Sensing
NameI. Reshetouski, H. Oyaizu, K. Nakamura, R. Satoh, S. Ushiki, R. Tadano, A. Ito, J. Murayama
Details

We introduce a lensless imaging framework for contemporary computer vision applications in long-wavelength infrared (LWIR). The framework consists of two parts: a novel lensless imaging method that utilizes the idea of local directional focusing for optimal binary sparse coding, and lensless imaging simulator based on Fresnel-Kirchhoff diffraction approximation. Our lensless imaging approach, besides being computationally efficient, is calibration-free and allows for wide FOV imaging. We employ our lensless imaging simulation software for optimizing reconstruction parameters and for synthetic image generation for CNN training. We demonstrate the advantages of our framework on a dual-camera system (RGB-LWIR lensless), where we perform CNNbased human detection using the fused RGB-LWIR data.

more
ThemeAccurate Polarimetric BRDF for Real Polarization Scene Rendering
Academic ConferenceIEEE European Conference on Computer Vision 2020
Technology CategoryImaging / Sensing
NameY. Kondo, T. Ono, L. Sun, Y. Hirasawa, J. Murayama
Details

Polarization has been used to solve a lot of computer vision tasks such as Shape from Polarization (SfP). But existing methods suffer from ambiguity problems of polarization. To overcome such problems, some research works have suggested to use Convolutional Neural Network(CNN). But acquiring large scale dataset with polarization information is a very difficult task. If there is an accurate model which can describe a complicated phenomenon of polarization, we can easily produce synthetic polarized images with various situations to train CNN. In this paper, we propose a new polarimetric BRDF (pBRDF) model. We prove its accuracy by fitting our model to measured data with variety of light and camera conditions. We render polarized images using this model and use them to estimate surface normal. Experiments show that the CNN trained by our polarized images has more accuracy than one trained by RGB only.

more
ThemeMocLis: A Moving Cell Support Protocol Based on Locator/ID Separation for 5G System
Academic ConferenceIEEE International Conference on Communications(ICC)
Technology Category5G / IoT
NameT. Ochiai,K. Matsueda,
F. Teraoka (Keio University, Japan)
H. Takano,R. Kimura,
R. Sawai(Sony Corporation)
Details

In the LTE/LTE-Advanced (LTE-A) system, user-plane for a user equipment (UE) is provided by tunneling which causes header overhead, processing overhead, and management overhead. In addition, the LTE-A system does not support moving cells which are composed of a mobile Relay Node (RN) and UEs attached to the mobile RN. There are several proposals for moving cells in the LTE-A system and the 5G system, however, all of them rely on tunneling for the user-plane, which means that none of them avoid the tunneling overheads. This paper proposes MocLis, a moving cell support protocol based on a Locator/ID split approach. MocLis does not use tunneling. Nested moving cells are supported. Signaling cost for handover of a moving cell is independent of the number of UEs and nested RNs in the moving cell. MocLis is implemented in Linux; user space daemons and modified kernel. The measurement results show that the attachment time and handover time are short enough for practical use. TCP throughput in MocLis is faster than that in the tunneling based approaches.

more
ThemeMIXED PRECISION DNNS: ALL YOU NEED IS A GOOD PARAMETRIZATION
Academic ConferenceInternational Conference on Learning Representations (ICLR)
Technology CategoryAI / Robotics
NameS. Uhlich, L. Mauch,
F. Cardinaux,
K. Yoshiyama,
J. A. García, S. Tiedemann,
T. Kemp (Sony Europe B.V.)
A. Nakamura (Sony Corporation)
Details

Efficient deep neural network (DNN) inference on mobile or embedded devices typically involves quantization of the network parameters and activations. In particular, mixed precision networks achieve better performance than networks with homogeneous bitwidth for the same size constraint. Since choosing the optimal bitwidths is not straight forward, training methods, which can learn them, are desirable. Differentiable quantization with straight-through gradients allows to learn the quantizer’s parameters using gradient methods. We show that a suited parametrization of the quantizer is the key to achieve a stable training and a good final performance. Specifically, we propose to parametrize the quantizer with the step size and dynamic range. The bitwidth can then be inferred from them. Other parametrizations, which explicitly use the bitwidth, consistently perform worse. We confirm our findings with experiments on CIFAR-10 and ImageNet and we obtain mixed precision DNNs with learned quantization parameters, achieving state-of-the-art performance.

more
ThemeA Practical Method for 3D-Modeling of Glass Weave
Academic ConferenceDesignCon
Technology CategoryDesign / Manufacturing / Operation
NameK. Nonaka, T. Nakamura,
K. Sawada (Sony LSI Design Incorpolated)
Details

It has become important to make printed circuit board models for SI simulation with the effect of glass weave as the signal frequency of interconnect becomes higher. Most of previous works were based on measurement results. They have adequate accuracy, but are not suitable for the consumer product design with low cost and/or short TAT. In this paper, we propose a practical modeling method of glass weave using only general information obtained from board manufacturers. In order to confirm the accuracy of this method, we designed several measurement boards and confirmed there was a high correlation between simulation and measurement.

more
ThemeA study of energy resolution in CPD indirect photon-counting X-ray imaging
Academic ConferenceInternational society for optics and photonics (SPIE)
Technology CategoryImaging / Sensing
NameT. Nishihara, O. Kumagai,
T. Izawa (Sony Semiconductor Solutions Corporation)
H. Baba (Sony Global Manufacturing & Operations Corporation)
N. Shinohara (Gifu University of Medical Science)
Details

Recently we have reported indirect X-ray photon-counting imaging using CMOS photon detectors and have shown its high spatial resolution. However, at that time its energy resolution was totally unknown. Thus, there was a question about whether it can detect low energy X-ray photons of 20keV with sufficient quantum efficiency. In this study we exposed the CPD test devices to near single energy X-ray photons of 19.5keV adopting a clinical mammography equipment and additional Mo filters, and measured output intensity distributions. We also fitted our intensity distribution model to the results estimating signal yield per keV and parameters for signal variation.

more
ThemeA 132dB Single-Exposure-Dynamic-Range CMOS Image Sensor with High Temperature Tolerance
Academic ConferenceInternational Solid-State Circuits Conference (ISSCC)
Technology CategoryImaging / Sensing
NameY. Sakano, T. Toyoshima,
R. Nakamura, T. Asatsuma,
Y. Hattori, N. Kawazu,
T. Matsuura, T. Iinuma,
T. Toya, A. Suzuki,
Y. Motohashi, J. Azami,
Y. Tateshita, T. Haruta (Sony Semiconductor Solutions Corporation)
T. Yamanaka, R. Yoshikawa,
T. Watanabe (Sony Semiconductor Manufacturing Corporation)
Details

We fabricated a 5.4-megapixel stacked backside-illuminated CIS using a sub-pixel architecture with high-temperature tolerance. The in-pixel floating capacitor and some transistor are embedded above the large photodiode and correlated double sampling read-out of the small photodiode is implemented. Thanks to this architecture, the single-exposure dynamic range of 121dB and an ultra-low random noise of 0.68e-rms can be achieved while maintaining more than 25dB of minimum composition SNR at 100℃.

more
ThemeA 0.50e-rms Noise 1.45μm-Pitch CMOS Image Sensor with Reference-Shared In-Pixel Differential Amplifier at 8.3Mpixel 35fps
Academic ConferenceInternational Solid-State Circuits Conference (ISSCC)
Technology CategoryImaging / Sensing
NameM. Sato, Y. Yorikado,
Y. Matsumura, H. Naganuma,
E. Kato, T. Toyofuku,
A. Kato, Y. Oike (Sony Semiconductor Solutions Corporation)
Details

We developed a 1.45 μm pixel, 8.3 MPix, back-illuminated stacked 1/2.8-inch CMOS image sensor. It has two selective conversion gains that each do not require additional transistors within the pixel area. A readout noise of 0.50e-rms is realized using a reference-shared in-pixel differential amplifier with correlated multiple sampling at 35 fps. The operation point of the amplifier can be adjusted with a negative feedback reset technique, enabling a 200 mV output swing and 2.5% PRNU.

more
ThemeA 1280×720 Back-Illuminated Stacked Temporal Contrast Event-Based Vision Sensor with 4.86μm Pixels, 1.066GEPS Readout, Programmable Event-Rate Controller and Compressive Data-Formatting Pipeline
Academic ConferenceInternational Solid-State Circuits Conference (ISSCC)
Technology CategoryImaging / Sensing
NameT. Finateu, D. Matolin,
A. Mascheroni, E. Reynaud,
L. Chotard, F. LeGoff,
C. Posch (Prophesee S.A.)
A. Niwa, K. Tsuchimoto,
H. Takahashi, H. Wakabayashi,
Y. Oike (Sony Semiconductor Solutions Corporation)
P. Mostafalu, F. Brady (Sony Electronics Incorpolated)
Details

Event-based vision and image sensors, due to large circuitry in each pixel, benefit from wafer-stacking to achieve competitive pixel sizes and resolutions. This paper presents a 1280x720 resolution 4.86um pixel-pitch Back-Illuminated (BI) stacked temporal contrast event-based vision and image sensor with 1us row-level time-stamping, 1.066GEPS event readout, digital pre-processing and data formatting pipeline and 43mW to 94mW power consumption for high-speed low-power machine vision applications.

more
ThemeA 0.5V BLE Transceiver with a 1.9mW RX Achieving -96.4dBm Sensitivity and 4.1dB Adjacent Channel Rejection at 1MHz Offset in 22nm FDSOI
Academic ConferenceInternational Solid-State Circuits Conference (ISSCC)
Technology CategoryDesign / Manufacturing / Operation
NameM. Tamura, H. Takano,
S. Shinke, H. Fujita,
H. Nakahara, N. Suzuki,
Y. Nakada, Y. Shinohe,
S. Etou, Y. Katayama (Sony Semiconductor Solutions Corporation)
T. Fujiwara (Sony LSI Design Incorpolated)

more
ThemeFeature Quantities of EEG for a Model to Estimate Human Internal States of Concentration and Relaxation.
Academic Conference41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society
Technology CategoryImmersive Experience
NameN. Sazuka, Y. Komoriya, T. Ezaki(Sony Corporation),
T. Oba, H. Ohira(Nagoya University)
Details

We proposed novel feature quantities of electroencephalogram (EEG) to effectively detect internal states in humans. A machine learning model using the proposed feature quantities of time series EEG powers showed higher accuracy to estimate states of concentration and relaxation compared to another model using conventional feature quantities of average EEG powers per bands of frequencies.

more
ThemeAb initio analysis of defect formation and dopant activation in P and As co-doped Si
Academic ConferenceEuropean Materials Research Society (E-MRS)
Technology CategoryDesign / Manufacturing / Operation
NameN. Nakazaki (Sony Semiconductor Solutions Corporation)
E. Rosseel, C. Porret,
A. Hikavyy, R. Loo,
N. Horiguchi, G. Pourtois (imec)
Details

A highly-doped epitaxially-grown source/drain is a key element for the development of the next-generation of transistors such as FinFET and GAA, where a high active dopant concentration is required for the device optimization. This in turn requires a deep understanding of the defect formation mechanisms related to the dopant deactivation. This paper presents a density functional theory study of the dopant-defect complex formation in P and As co-doped Si layer (Si:P+As). Emphasis is placed on the impact of As co-doping on the defect formation in Si:P. Numerical results indicate that for both P and As, a stable deactivation vacancy (V) complex in Si is DxV (D = P or As, x = 1–4), where the V is surrounded by dopants. The formation enthalpy of AsxV is lower than that of PxV, implying that As traps V more efficiently than P. In turn, As co-doping is expected to help increasing the active P concentration by releasing them from the complex. In addition, our simulation results indicate that a stable interstitial dopant-based complex in Si is a split interstitial (S), where two deactivated dopants occupy one Si site. As is found to form more likely S in Si than P at high dopant concentration, implying that solubility of As in Si is lower than that of P. Our simulations suggest that at low concentration, As co-doping with Si:P enhances the P activation. This is in good agreement with our experimental observations, where a 1.2 % As+P concentration grown at 450°C leads to a higher active dopant concentration than for Si:P alone.

more
ThemeCharacterization of Impact of Vertical Stress on FinFETs
Academic ConferenceEuropean Microelectronics and Packaging Conference (EMPC)
Technology CategoryDesign / Manufacturing / Operation
NameT. Furuhashi, M. Haneda,
T. Sasaki, Y. Kagawa,
Y. Ooka, T. Hirano,
M. Saito, K. Ohno,
H. Iwamoto (Sony Semiconductor Solutions Corporation)
Y. Liu, G. Hiblot,
K. Vanstreels, M. Gonzalez,
D. Velenis, G. Beyer,
G. Van der Plas, I. De Wolf,
E. Beyne (imec)
Details

We investigated the vertical stress impact on FinFETs using in-situ electrical measurements in a nano-indenter setup. We found that the impact of vertical stress on Id for N and P-type FinFETs increases for longer gate lengths. According to mechanical simulations, if vertical stress is applied to the sample surface, the FinFET feels not only vertical compressive stress but also non-negligible horizontal compressive stress. Furthermore, we confirmed that the in-plane compressive stresses have different values in directions perpendicular and parallel to the Fin using S-Device simulation. The experimental results of Id variations can be explained by a change of electron (hole) effective mass and electron scattering considering the vertical and horizontal piezo resistivity coefficients.

more
ThemeCrystal LED Display System for Immersive Viewing Experience
Academic ConferenceInternational Display Workshops (IDW)
Technology CategoryAudio / Visual
NameK. Tomoda, N. Kikuchi,
H. Kadota (Sony Semiconductor Solutions Corporation)
G. Biwa(Sony Corporation)
Details

We developed a novel active matrix driving technology that integrates RGB micro LEDs and a micro IC in each pixel for our Crystal LED display system. With precise tiling technology, a large-scale image with immersive viewing experience can be delivered.

more
ThemeWatt-class Operation of GaN-based Blue and Green Laser Diodes
Academic ConferenceInternational Display Workshops (IDW)
Technology CategoryAudio / Visual
NameH. Watanabe, Y. Nakayama,
Y. Hoshina, M. Murayama,
T. Koyama, N. Fuutagawa,
H. Kawanishi, K. Yanashima(Sony Corporation)
Y. Kikuchi, Y. Kogure,
Y. Kadowaki (Sony Semiconductor Manufacturing Corporation)
K. Mizutani, T. Uemura (Toyoda Gosei Co., Ltd.)
Details

Visible laser diodes have recently attracted a great deal of attention as light sources for various display and lighting applications. In this paper, recent progress in green and blue lasers developed at Sony, which realize
watt-class output power, are reported.

more
ThemeVoltage-induced dynamic magnetization switching with low write error rate
Academic ConferenceAnnual Conference on Magnetism and Magnetic Materials (MMM)
Technology CategoryDesign / Manufacturing / Operation
NameT. Nozaki, T. Yamamoto,
H. Imamura, T. Nozaki,
M. Konoto, H. Kubota,
A. Fukushima, Y. Suzuki (Spintronics Research Center, AIST)
M. Endo, H. Ohmori,
Y. Higo, M. Hosomi (Sony Semiconductor Solutions Corporation)

more
ThemeWatt-Class Operation of Green Laser Diodes on Semipolar {20-21} Gallium Nitride Substrates
Academic ConferenceInternational Conference on Nitride Semiconductors (ICNS)
Technology CategoryAudio / Visual
NameY. Nakayama, H. Watanabe,
M. Murayama, T. Koyama,
N. Fuutagawa, H. Kawanishi,
K. Yanashima(Sony Corporation)
Y. Kadowaki, Y. Kogure (Sony Semiconductor Manufacturing Corporation)
T. Uemura (Toyoda Gosei Co., Ltd.)

more
ThemeInsights of Different Etching Properties between CW and ALE Processes using 3D Voxel-Slab Model,
Academic ConferenceInternational Conference on Atomic Layer Deposition / International Atomic Layer Etching Workshop (ALD/ALE)
Technology CategoryDesign / Manufacturing / Operation
NameN. Kuboi, T. Tatsumi,
J. Komachi, S. Yamakawa (Sony Semiconductor Solutions Corporation)

more
ThemeLow‐Temperature Selective Growth of Heavily Boron‐Doped Germanium Source/Drain Layers for Advanced pMOS Devices
Academic ConferencePhysica Status Solidi A
Technology CategoryDesign / Manufacturing / Operation
NameC. Porret, A. Vohra,
A. Hikavyy, B. Douhard,
J. Meersschaut,
J. Bogdanowicz,
E. Rosseel, G. Pourtois,
R. Langer, R. Loo (imec)
N. Nakazaki (Sony Semiconductor Solutions Corporation)
Details

The peculiarities of heavily boron‐doped germanium, selectively grown at low temperature by means of a cyclic deposition and etch chemical vapor deposition process, are investigated through the analysis of the structural and electrical material properties. The incorporation of B in Ge can exceed 6 × 1020 cm−3, close to a factor 100 above the solubility limit, without any significant degradation of the Ge:B crystalline quality, although high B‐doping induces an unwanted contraction of the Ge lattice. Micro‐Hall effect measurements and the multiring circular transmission line method are used to evaluate the active carrier concentrations and resistivities of Ti/Ge:B contacts. Even though the resistivity of as‐grown layers saturates for chemical B concentrations approaching 1 × 1021 cm−3 and increases beyond that level, a contact resistivity below 3 × 10−9 Ω cm2 is obtained for the highest active doping concentration, showing that a compromise must be found to decrease the total contact resistance. Finally, first principles simulations are used to understand dopant deactivation mechanisms in the Ge:B system. In conclusion, the formation of boron‐interstitial clusters is most likely the cause for electrical performance degradation at high doping values.

more
ThemeGrowth of GaSb Nano-ridges on patterned 300 mm Si wafers
Academic ConferenceInternational Conference on Crystal Growth and Epitaxy (ICCGE)
Technology CategoryDesign / Manufacturing / Operation
NameM. Baryshnikova, Y. Mols,
R. Alcotte, H. Han,
T. Hantschel, N. Bosman,
O. Richard, M. Pantouvaki,
J. Van Campenhout,
R. Langer,
B. Kunert (imec)
Y. Ishii (Sony Semiconductor Solutions Corporation)
Details

The integration of III/V materials on Si substrate holds great potential for the fabrication of new multifunctional devices but is also challenging. To combine optoelectronic applications in the near- and mid-IR spectral regions with Si electronics, it is of great interest to develop a growth process of GaSb based narrow band gap compounds on Si substrate. The essential requirement for an efficient device performance is a low threading dislocation density in the active layer. One of the techniques to achieve a high crystal quality in mismatched III/V-Si hetero-epitaxy is based on the selective area growth (SAG) in high-aspect-ratio structures patterned on a Si substrate. The strain induced plastic relaxation occurs in a restricted region close to the hetero-interface, e.g. inside a narrow trench. This leads to a confinement of threading dislocations inside the trench and allows the grown-out nano-ridge material to be free of defects. This so-called aspect ratio trapping (ART) approach reduces the defect density and was already successfully demonstrated for different material systems [1]. It is important to note here that the III/V integration applying SAG together with ART has the potential to be fully compatible with CMOS technology and mass production. In our study we deposit GaSb layers in narrow trenches formed in a several hundred nanometer thick SiO2 layer. The inspection of material properties of the samples is carried out by means of scanning electron microscopy (SEM), atomic force microscopy (AFM), high-resolution X-ray diffraction (HRXRD), transmission electron microscopy (TEM) and electron channeling contrast imaging (ECCI) techniques. In this study we investigate the defect reduction by ART in GaSb layers using different seed layers and grown out from the trenches of varying sizes. The first results indicate that the threading dislocation density is improved in GaSb nano-ridges grown on GaAs seed layers. Values below 5∙10^6 〖cm〗^(-2) can be reached which are comparable with results for GaAs fins. Another significant aspect of the GaSb nano-ridge growth is the control of its shape. By applying specific deposition conditions, it is possible to deposit GaSb fins with “funnel”, “diamond” and “box” profiles. The growth parameters leading to a flat (001) surface on top of the nano-ridges and the lowest defect density have been used to grow very first heterostructures based on InxGa1-xSb and InAs.

[1] Kunert et al., 2018 Semicond. Sci. Technol. 33 093002

more
ThemeDifference between Similars: a Novel Method to Use Topic Models for Sensor Data Analysis
Academic ConferenceWorkshop on Data Mining in Industrial Internet of Things (DMIIOT)
Technology CategoryDesign / Manufacturing / Operation
NameT. Masada (Nagasaki University)
T. Eguchi, D. Hamaguchi (Sony Semiconductor Manufacturing Corporation)
Details

We propose a novel method to use the topics obtained by topic modeling for sensor data analysis. This paper describes a case study where we perform an exploratory data analysis of manufacturing sensor data by using latent Dirichlet allocation (LDA) as a tool to discover remarkable change patterns. Our target is a set of time-series data originating from the sensors installed in a closed factory environment. Each sensor gives a different type of measurements representing the same manufacturing process repeatedly operated in a lot-by-lot manner. We first discretize the data based on the histogram of sensor measurements and construct a bag-of-words representation. We then apply LDA to discover change patterns across tens of thousands of lots. When we apply LDA to natural language documents, the resulting topics are widely different from each other, because the documents intrinsically show a considerable diversity. In contrast, our data coming from repeatedly operated manufacturing processes include only a limited diversity. As a result, LDA provides very similar topics only showing a small difference. Our main and unexpected finding is that the difference between similar topics is useful in discovering remarkable change patterns. We performed an experiment over the data sets containing sensor measurements collected in the factory. The results reveal that the subtle difference between very similar topics often corresponds to an interesting change pattern of sensor outputs.

more
ThemeA 6.9 μm Pixel-Pitch 3D Stacked Global Shutter CMOS Image Sensor with 3M Cu-Cu connections
Academic ConferenceIEEE International Conference on 3D System Integration (3DIC)
Technology CategoryImaging / Sensing
NameT. Miura, M. Sakakibara,
H. Takahashi, T. Taura,
K. Tatani, Y. Oike,
T. Ezaki (Sony Semiconductor Manufacturing Corporation)
Details

In this paper, we report on a 3D stacked global shutter CMOS image sensor with 3M Cu-Cu connections. Using a fine pitch and a large amount of Cu-Cu connection technology, we achieved 1.46M pixels of 6.9 μm × 6.9 μm size. The pixel evaluation result shows that all 3M Cu-Cu connections have been successfully conducted.

more
Theme3D Integration Technologies For The Stacked CMOS Image Sensors
Academic ConferenceIEEE International Conference on 3D System Integration (3DIC)
Technology CategoryImaging / Sensing
NameY. Kagawa, H. Iwamoto (Sony Semiconductor Solutions Corporation)
Details

Nowadays, the Internet-Of-Things (IoT) consists of a variety of LSIs. To use in different situations, small, multi-functional and high-performance LSIs have been strongly needed. One promising solution is System-In-Package (SiP) such as multi chip 2D package and 3D stack package. 3D chip stacking technology in particular can easily implement different function chips in a small system. We have contributed to the development of multifunctional, high-performance products with various 3D chip stacking technologies for many years.

more
ThemeAtomic-level surface reaction control; the next era of dry etching technology
Academic ConferenceAsia-Pacific International Symposium on the Basics and Applications of Plasma Technology (APSPT)
Technology CategoryDesign / Manufacturing / Operation
NameM. Fukasawa (Sony Semiconductor Solutions Corporation)

more
ThemeVoltage-controlled magnetic anisotropy in an ultrathin Ir-doped Fe layer with a CoFe termination layer
Academic ConferenceApplied Physics Letters Materials (APL Materials)
Technology CategoryDesign / Manufacturing / Operation
NameT. Nozaki, T. Yamamoto,
T. Nozaki, M. Konoto,
H. Kubota, A. Fukushima,
Y. Suzuki, S. Yuasa (Spintronics Research Center, AIST)
M. Endo, H. Ohmori,
Y. Higo, M. Hosomi (Sony Semiconductor Solutions Corporation)
M. Tsujikawa, M. Shirai (Tohoku University)

more
ThemeNanophotonics Contributions to state-of-the-art CMOS Image Sensors
Academic ConferenceInternational Electron Devices Meeting (IEDM)
Technology CategoryImaging / Sensing
NameS. Yokogawa (Sony Semiconductor Solutions Corporation)
Details

Recent progress of Back-illuminated CMOS image sensors (BI-CISs), focusing on their pixel improvements with the design of optical properties using subwavelength size scale strcutures and photonics technologies, are reviewed. These technologies contribute not only improving BI-CIS basic performance but also adding new functions for versatile sensing applications.

more
ThemeA 1/2inch 48M All PDAF CMOS Image Sensor Using 0.8μm Quad Bayer Coding 2×2OCL with 1.0lux Minimum AF Illuminance Level
Academic ConferenceInternational Electron Devices Meeting (IEDM)
Technology CategoryImaging / Sensing
NameT. Okawa, S. Ooki,
H. Yamajo, M. Kawada,
M. Tachi, K. Goi,
M. Nakamizo, T. Ogasahara,
Y. Kitano, K. Tatani (Sony Semiconductor Solutions Corporation)
T. Yamasaki, H. Iwashita (Sony Semiconductor Manufacturing Corporation)
Details

We created the world's first all PDAF CMOS image sensor using 2x2 on-chip lens architecture. That had 1/2 inch 48M pixels with 0.8μm Quad Bayer coding for high resolution and HDR function, and all PDAF pixels achieved a minimum AF illuminance level of 1 lux.

more
ThemeThree-layer Stacked Color Image Sensor With 2.0-μm Pixel Size Using Organic Photoconductive Film
Academic ConferenceInternational Electron Devices Meeting (IEDM)
Technology CategoryImaging / Sensing
NameH. Togashi, T. Watanabe,
M. Joei, T. Hayashi,
S. Hirata, S. Fukuoka,
Y. Ando, Y. Sato,
J. Yamamoto, I. Yagi,
F. Koga, T. Yamaguchi,
Y. Oike (Sony Semiconductor Solutions Corporation)
M. Murata, M. Kuribayashi,
T. Ezaki, T. Hirayama(Sony Corporation)
Details

A three-layer stacked color image sensor was formed using an organic film. The sensor decreases the falsecolor problem as it does not require demosaicing. Furthermore, with the 2.0-μm pixel image sensor, improved spectral characteristics owing to green adsorption by the organic film above the red/blue photodiode, were successfully demonstrated.

more
ThemeHigh-definition Visible-SWIR InGaAs Image Sensor using Cu-Cu Bonding of III-V to Silicon Wafer
Academic ConferenceInternational Electron Devices Meeting (IEDM)
Technology CategoryImaging / Sensing
NameS. Manda, R. Matsumoto,
S. Saito, S. Maruyama,
H. Minari, T. Hirano,
T. Takachi, N. Fujii,
Y. Yamamoto, Y. Zaizen,
T. Hirano,
H. Iwamoto (Sony Semiconductor Solutions Corporation)
Details

We developed a back-illuminated InGaAs image sensor with 1280 x 1040 pixels at 5-um pitch by using Cu-Cu hybridization connecting different materials, a III-V InGaAs/InP of photodiode array, and a silicon readout integrated circuit (ROIC). A prototype device showed high sensitivity at visible to SWIR wavelengths and low dark current.

more
ThemeDamage recovery and low‐damage etching of ITO in H2/CO plasma: Effects of hydrogen or oxygen
Academic ConferencePlasma Processes and Polymers (PPAP)
Technology CategoryDevice / Material
NameA. Hirata,
M. Fukasawa,
K. Kugimiya,
K. Nagaoka (Sony Semiconductor Solutions Corporation )
K. Karahashi,
S. Hamaguchi (Osaka University)
Details

Hydrogen‐containing plasma etching of tin‐doped indium oxide (ITO) causes surface reduction damage induced by the plasma itself. Damage recovery by using O‐containing plasma and low‐damage etching in plasma with hydrogen and oxygen were investigated. While recovery was possible with O2 plasma after H‐containing plasma exposure, O 2 plasma caused excess oxidation of the ITO surface, which can degrade device properties. Simultaneous injection of hydrogen and oxygen recovered the reduced ITO to its initial state. The etching performance was also investigated; low‐damage etching was achieved with hydrogen and oxygen‐mixed plasma by controlling the balance between surface reduction and oxidation.

more
ThemeAI x Robotics: Technology Challenges and Opportunities in Sensors, Actuators, and Integrated Circuits
Academic ConferenceInternational Solid-State Circuits Conference(ISSCC)
Technology CategoryAI / Robotics
NameM. Fujita (Sony Corporation)
Details

In 1956 at the Dartmouth conference, the terminology of artificial intelligence (AI) was first used. In those days it was also called as symbolic AI (Symbolic-AI). For example, let us assume a block world problem in Fig 17.1.1 (right), where a block is represented as a symbol, “Block1”, and it can be operated by operators such as “PICKUP(Block1). In order to apply the operator “PICKUP” to the target object, “Block1”, the AI system has to check the pre-condition such as “CLEAR Block1”, which means there is no object on “Block1”. The Fig. 17.1.1 (right) shows an example of a task from State-A to State-B. The system has to search the possible operators and the pre-conditions so that State-B is achieved. There are many basic algorithms developed in Symbolic-AI era, which are often used today including the A*-search algorithm. Shakey is the representative example of intelligent robots based on Symbolic-AI. It was a wheel-based movable robot equipped with a TV-camera, Laser-Range-Finder, etc. It can move blocks in the real world using Symbolic-AI technologies. Its behavior control architecture is shown in Fig 17.1.1 (left). It has three steps, SENSE, PLAN, and ACT. Therefore, it is known as the SENSE-PLAN-ACT architecture. It is computationally intensive especially in the PLAN computation, therefore it is difficult if the environment is dynamically changing.

more
ThemeA Self-Calibrated 16GHz Subsampling-PLL-Based 30μs Fast Chirp FMCW Modulator with 1.5GHz Bandwidth and 100kHz rms Error
Academic ConferenceInternational Solid-State Circuits Conference(ISSCC)
Technology CategoryDesign / Manufacturing / Operation
NameQ. Shi,
N. Markulic,
J. Craninckx (imec)
K. Bunsen (Sony Semiconductor Solutions Corporation)
Details

Frequency-modulated continuous-wave (FMCW) radars are critical for autonomous-driving applications. Sawtooth waveforms with short chirp time ( 4GHz) improves the range resolution to 3.75cm. To meet these requirements, fractionalN PLLs with two-point-modulation (TPM) scheme are presented in [1-3] with a relative rms FM error of 0.05% when normalized to the chirping bandwidth. In conventional FMCW PLLs, the lowpass signal injection is implemented through division ratio modulation in the PLL feedback path. However, the divider power consumption and noise are not negligible. In [4], a subsampling GMSK modulator improves the EVM. By using a high-resolution DTC, the divider quantization noise contribution to the PM/FM error is eliminated. In this work, the frequency modulation bandwidth is extended to GHz range and a 16GHz subsampling PLL (SS-PLL) is used for fast-chirp generation. Thanks to the TPM technique and selfcalibration, this PLL takes 30μs to chirp a sawtooth waveform with a 1.5GHz BW (~9.5% of the carrier frequency). The rms FM error is 100kHz, which is 0.007% of BW

more
ThemeAutomotive Image Sensor for autonomous vehicle and adaptive driver assistance system
Academic ConferenceSymposia on VLSI Technology and Circuits (VLSI)
Technology CategoryImaging / Sensing
NameH. Matsumoto (Sony Semiconductor Solutions Corporation)
Details

Human vision is the most essential sensor to drive vehicle. Instead of human eyes, CMOS image sensor is the best sensing device to recognize objects and environment around the vehicle. Image sensors are also used in various use cases such as driver and passenger monitor in cabin of vehicle. For these use cases, some special functionalities and specification are needed. In this session the requirements for automotive image sensor will be discussed such as high dynamic range, flicker mitigation and low noise. In the last part the key technology to utilize image sensor, such as image recognition and computer vision will be discussed.

more
ThemeHigh-Efficiency OLED Microdisplay with Microlens Array
Academic ConferenceThe Society for Information Display (SID)
Technology CategoryAudio / Visual
NameY. Motoyama (Sony Semiconductor Solutions Corporation)
Details

A high‐efficiency organic light‐emitting diode (OLED) microdisplay has been developed with some new technologies including microlens array. We focused on the improvement of the out‐coupling efficiency and achieved three times higher efficiency as compared with conventional OLED. By using our developed technologies, it is possible to improve the maximum luminance from 1600 to 5000 cd/m2 while maintaining same lifetime.

more
ThemeTechnologies for the Crystal LED Display System
Academic ConferenceThe Society for Information Display (SID)
Technology CategoryAudio / Visual
NameG. Biwa (Sony Corporation)
Details

We have developed a novel active matrix driving technology integrating RGB micro LEDs and a micro IC in each pixel for the Crystal LED display system. With precise tiling technology, a large-scale image with superior quality can be produced. We will present related key technologies in this paper.

more
ThemeA Hybrid Clock Tree with Multi-spine using Automated Design Methodology for Low Cost and Low Power Complex LSIs
Academic ConferenceDesign Automation Conference (DAC)
Technology CategoryDesign / Manufacturing / Operation
NameT. Hasegawa,
M. Yajima,
M. Fujiwara,
T. Saeki (Sony LSI Design Incorporated)

more
ThemeSub-pixel Architecture of CMOS Image Sensor Achieving over 120 dB Dynamic Range with less Motion Artifact Characteristics
Academic ConferenceInternational Image Sensor Workshop (IISW)
Technology CategoryImaging / Sensing
NameT. Asatsuma,
Y. Sakano,
S. Iida,
I. Yoshiba,
H. Mizuno,
T. Oka,
K. Yamaguchi,
A. Suzuki,
K. Suzuki,
M. Yamada,
Y. Tateshita,
K. Ohno (Sony Semiconductor Solutions Corporation)
M. Takami,
N. Ohba (Sony Semiconductor Manufacturing Corporation)
Details

"In recent years, real-time sensing has been creating new businesses and social changes, specifically in the internet of things and automotive fields. The accurate perception of moving objects and detection with high color reproducibility for all light conditions is a necessity. A conventional high dynamic range (HDR) technique uses the multiple exposure method. However, this technique causes motion artifacts depending on the sampling time difference of the moving objects. We have developed a CMOS image sensor with a new architecture. The characteristic of this sensor is that it has been designed with a sub-pixel architecture that contains a single large photodiode, a single small photodiode, and an in-pixel floating capacitor."

more
ThemeA Novel Interconnection Technology Using Ultra-Thin Under Barrier Metal for Multiple Chip-on-Chip Stacking Structure
Academic ConferenceElectronic Components and Technology Conference (ECTC)
Technology CategoryDevice / Material
NameT. Nakamura,
K. Shimizu,
M. Maehara,
T. Hayashi,
K. Akiyama,
J. Fujimagari,
H. Iwamoto (Sony Semiconductor Solutions Corporation)
T. Ohkubo,
A. Fujiwara (Sony Semiconductor Manufacturing Corporation)
Details

In this report, connectivity and reliability between barrier metal and SnAg solder were evaluated from the viewpoint of intermetallic compound (IMC) growth and reliability behavior. On the basis of previous public research, Co, Ti and Ta were selected as the barrier metal. On the top chip side, 22,000 SnAg solder bumps with 20 umΦ/ 40 um pitch were formed. On the bottom chip side, Al pads with the selected barrier metal were fabricated as an electrode. Connectivity testing of the SnAg bump and barrier metal was carried out by Kelvin resistance, Daisy chain, and cross-sectional SEM. Reliability tests were carried out under high temperature and heat cycling conditions. The results of the connectivity verification revealed similar electric characteristics between SnAg solder and barrier in all barrier metal types. This successful demonstration of sub-micron thickness Co material as a barrier metal of SnAg solder bump connection will enable high reliability one-side soldering with reduced processing costs.

more
ThemeIndirect photon-counting x-ray imaging using CMOS Photon Detector (CPD)
Academic Conferencethe international society for optics and photonics (SPIE) Medical Imaging
Technology CategoryImaging / Sensing
NameT. Nishihara
M. Matsumura
O. Kumagai
T. Izawa (Sony Semiconductor Solutions Corporation)
H. Baba (Sony Global Manufacturing and Operations Corporation)
Details

"CMOS photon detectors (CPDs) are recently proposed photon sensing devices utilizing the latest CMOS image sensor (CIS) technologies. CPDs are non-electron-multiplying devices, whose pixels have a fully depleted photo diode and have a readout noise of sub electron RMS even at room temperature. Using a 15μm pixel CPD test device coupled to a CsI(Tl) scintillator, we successfully obtained photon-count X-ray images. A Hamamatsu Photonics scintillator with 150μm CsI(Tl) layer coupled to fiber optic plate (FOP) of 3mm thickness is diced in dry condition and directly glued to the sensor surface. X-ray photons are injected from an X-ray tube with accelerating voltage of 30kV and 45kV using W target. Each X-ray photon creates a scintillation light spot in the captured images, where the injected position and photon energy are determined by integrating multiple pixel outputs at that spot. X-ray energy distributions were obtained at both 30kV and 45kV with reasonable differences. Modulated transfer function (MTF) of over 0.7 at 10LP/mm was achieved by mapping injected positions at 30kV. Photon-count images for slanted-edge MTF measurements as well as 10LP/mm of X-ray test chart were achieved. Those photon-count images were compared with conventional integral images obtained with the same sensing device. Both image types confirmed superior resolution with photon-counting. This indirect X-ray photon counting technique using CPD has a potential of getting critical X-ray information for medical applications by achieving accurate injected positions of X-ray photons and their energies simultaneously."

more
ThemeCharacterization of highly doped Si:P, Si:As and Si:P:As epi layers for Source/Drain epitaxy
Academic ConferenceInternational SiGe Technology and Device Meeting (ISTDM) / International Conference on Silicon Epitaxy and Heterostructures (ICSI)
Technology CategoryDevice / Material
NameE. Rosseel,
C. Porret,
B. Douhard,
J. Meersschaut,
A. Hikavyy,
R. Loo,
N. Horiguchi,
G. Pourtois (Imec)
M. Tirrito (Polytechnic University of Turin)
N. Nakazaki (Sony Semiconductor Solutions Corporation)
J. Tolle (ASM America)

more
ThemeA Self-Calibrated 16GHz Subsampling-PLL-Based 30s Fast Chirp FMCW Modulator with 1.5GHz Bandwidth and 100kHz rms Error
Academic ConferenceIEEE Journal of Solid-State Circuits (JSSC)
Technology CategoryDevice / Material
NameQ. Shi,
N. Markulic,
J. Craninckx (Imec)
K. Bunsen (Sony Semiconductor Solutions Corporation)

more
ThemeSensor x DNN
Academic ConferenceIEEE International Conference on Computational Photography (ICCP)
Technology CategoryAI / Robotics
NameT. Mitsunaga (Sony Semiconductor Solutions Corporation)
Details

Recent dramatic evolution of image understanding and machine vision technologies has made by deep neural networks(DNNs) and a huge of computing power. For now, the evolution is extending to the edge of the information network where there are a huge of sensors working with the sensor signal processing. The presenter introduces a survey on the recent DNN-based approaches for sensor signal processing and summarizes what we expect from the change of the sensor signal processing.

more
ThemeExtremely Low Voltage Operatable On-Chip-Monitor-Test Circuit for Plasma Induced Damage using High Sensitivity Ring-VCO(Voltage Controlled Oscillator)
Academic ConferenceIEEE International Conference on Microelectronic Test Structures (ICMTS)
Technology CategoryDevice / Material
NameM. Tomita,
S. Mori,
S. Miyake,
K. Ogawa,
Y. Fukuzaki,
H. Ohnuma (Sony Semiconductor Solutions Corporation)
Details

We developed a on-chip-monitor-test circuit that measures Vth fluctuation due to plasma induced damage(PID) during wafer process using ordinary AC test at low Vdd operation condition. The circuit was fabricated on 28nm process and actual measurement experiments were carried out to confirm the measurement principle. We confirmed that it can be operated at low 0.5v Vdd condition comparing 0.9v with previous circuit at same frequency. It can be adopted to IoT low power products beyond 28nm CMOS.

more
ThemeLaser addressed full-color photo-quality rewritable sheets based on thermochromic systems with Leuco dyes
Academic ConferenceThe Society for Information Display (SID)
Technology CategoryDevice / Material
NameY. Kaino,
A. Shuto,
H. Mizuno,
S. Asaoka,
T. Ishida,
K. Takagi,
I. Takahashi,
Y. Oishi,
T. Kamei,
K. Nomoto (Sony Corporation)
K. Kurihara (Sony Home Entertainment & Sound Products Inc.),
H. Amago,
T. Takeuchi,
A. Tejima,
M. Watanabe (Sony Global Manufacturing & Operations Corporation)
Details

We have developed a laser‐addressed full‐color photographic quality rewritable sheet. The sheet was composed of a vertically stacked Cyan/Magenta/Yellow‐thermochromic system with a mixture of leuco dyes, developers and photothermal conversion agents in a polymer matrix. The sheet was simply manufactured by roll‐to‐roll (R2R) processes. Writing and erasing were performed by scan of near‐infrared laser light. It achieved full‐color photographic quality images with a wide color gamut with 70% coverage of the Specifications for Web Offset Publications (SWOP) standards and a high resolution of 426 ppi. Clear rewritability has also been confirmed. Non‐contact laser writing has other advantages in that it can create an image under a protection film, and it has form factor flexibility. We have developed a reliability model for high‐temperature storage and a light fastness test. This model showed a good agreement with experimental data, and the lifetime of an image was estimated to be over 8 years under ambient conditions. This technology will create applications for on‐demand rewritable image design while saving power and reducing the use of paper, which will eventually contribute to a sustainable society.

more
Theme3D integration technology for CMOS image sensors and future prospects
Academic ConferenceIEEE International Conference on Microelectronic Test Structures (ICMTS)
Technology CategoryImaging / Sensing
NameR. Nakamura (Sony Semiconductor Solutions Corporation)
Details

"3D integration is core technology for advanced devices. CMOS image sensor(CIS) uses 3D integration technology most effectively and has remarkably progressed for these years. In order to realize higher sensitivity and multi-functionality, many types of back-illuminated(BI) stacked CISs are currently in mass production. This talk will focus on recent progress of 3D integration technology used in CIS devices, including (1)Technology of stacked CIS and evaluation (TSV, Cu-Cu bonding, etc.) (2) Advantages and use case of stacked CIS, and (3)Future prospects of CIS devices.

more
ThemeThe Scaling of Cu-Cu Hybrid Bonding For High Density 3D Chip Stacking
Academic ConferenceElectron Devices Technology and Manufacturing (EDTM)
Technology CategoryDevice / Material
NameY. Kagawa,
S. Hida,
Y. Kobayashi,
K. Takahashi,
S. Miyanomae,
M. Kawamura,
H. Nakayama,
S. Kadomura (Sony Semiconductor Manufacturing Corporation)
H. Kawashima,
H. Yamagishi,
T. Hirano,
K. Tatani,
K. Ohno,
H. Iwamoto (Sony Semiconductor Solutions Corporation)

more
ThemeSensor x DNN
Academic ConferenceImage Sensors Europe
Technology CategoryAI / Robotics
NameT. Mitsunaga (Sony Semiconductor Solutions Corporation)
Details

Recent dramatic evolution of image understanding and machine vision technologies has been made by deep neural networks (DNNs) and huge amounts of computing power. For now, the evolution is extending to the edge of the information network where there are a vast numbers of sensors working with the sensor signal processing. The presenter introduces a survey on the recent DNN-based approaches for sensor signal processing and summarizes what we expect from this evolution.

more
ThemeA Low Power Event-driven Back-illuminated Stacked CMOS Image sensor
Academic ConferenceAnnual World Congress of Smart Materials (WCSM)
Technology CategoryImaging / Sensing
Device / Material
NameH. Wakabayashi,
O. Kumagai,
A. Niwa,
H. Kato,
T. Ohyama,
T. Imoto,
M. Nakamizo,
T. Nishino,
T. Iinuma,
N. Kuzuya,
T. Wakano (Sony Semiconductor Solutions Corporation)
K. Hanzawa,
H. Murakami,
A. Bostamam,
F. Brady,
Y. Nitta (Sony Electronics Incorporated)
K. Hatsukawa (Sony LSI Design Incorporated)
S. Futami (Sony Depthsensing Solutions)

more
ThemeForthcoming Cross Point ReRAM
Academic ConferencePersistent Memory Summit
Technology CategoryDevice / Material
NameA. Tsutsui (Sony Semiconductor Solutions Corporation)

more
ThemeA 6.9-μm Pixel-Pitch Back-Illuminated Global Shutter CMOS Image Sensor With Pixel- Parallel 14-Bit Subthreshold ADC
Academic Conference2018 JSSC Best Paper Award
Commendation institutionsThe IEEE Journal of Solid-State Circuits
Technology CategoryImaging / Sensing
NameM. Sakakibara,
K. Ogawa,
S. Sakai,
Y. Tochigi,
K. Honda,
H. Kikuchi,
T. Wada,
Y. Kamikubo,
T. Miura,
M. Nakamizo,
N. Jyo, R. Hayashibara,
S. Miyata,
S. Yamamoto,
Y. Ota,
H. Takahashi,
T. Taura,
Y. Oike,
K. Tatani,
T. Ezaki,
T. Hirayama (Sony Semiconductor Solutions Corporation)
Details

In this paper, we report on a back-illuminated, global shutter, CMOS image sensor (CIS) with a pixel-parallel, single-slope analog-to-digital converter (ADC). We adopted a digital bucket relay transfer with multistage flip-flop connection, a pixel unit Cu-Cu connection, and positive-feedback circuitry, to realize a 6.9-μm pixel-pitch, 1.46-Mpixel pixel-parallel ADC. By operating the comparator with a bias current in the subthreshold region of 7.74-111 nA, we succeeded in reducing the peak current during simultaneous ADC. In combination with an ADC standby operation, we succeeded in further reducing the pixel-parallel ADC power consumption. With these techniques, we realized a normalized figure of merit of 0.24 nJ·e-rms/step calculated by dividing the entire sensor power by the effective ADC resolution at a subthreshold current of 111 nA during 660 frames/s operation.

more
ThemeStacked, multi-functional CMOS image sensor structure
Medal of HonorMedal with Purple Ribbon
Commendation institutionsCabinet Office, Government of Japan
Technology CategoryImaging / Sensing
NameTaku Umebayashi (Sony Semiconductor Solutions Corporation)

more
ThemeDisplay of the Year Award
Academic ConferenceSID 2019 「Display Industry Awards」
Commendation institutionsThe Society for Information Display
Technology CategoryAudio / Visual
NameSony Imaging Products & Solutions Inc.
Sony Semiconductor Solutions Corporation
Sony Corporation
Details

Sony Electronics Inc. announced that its Crystal LED display system has been recognized by the Society for Information Display (SID) as one of the organization’s 2019 Display Industry Award (DIA) winners for “Display of the Year.”

more
ThemeCorporate Award
Academic ConferenceMIPI Alliance Membership Awards Program
Commendation institutionsMIPI Alliance
Technology CategoryDesign / Manufacturing / Operation
NameSony Corporation
Sony Semiconductor Solutions Corporation
Details

This award recognizes Sony's contribution to the development of inter-chip communication standards such as I3C*1, C-PHY*2, and CSI-2*3, as well as Sony's role as the AWG*4 chair in establishing ARD*5 and taking the lead in electromagnetic compatibility (EMC) testing for automotive coaxial cables, which was the first for a standardization organization.

*1: I3C: A general-purpose inter-chip communication standard with backward compatibility with I2C, a communication standard developed by Philips Semiconductor.
*2: C-PHY: A physical layer communication standard between application processors, cameras, and other devices.
*3: CSI-2: A protocol layer communication standard between application processors and cameras. *4: Automotive Working Group (AWG): A working group responsible for formulating requirements for automotive interfaces.

more
ThemeLatency Compensation for Optical See-Through Head-Mounted with Scanned Display
Academic ConferenceThe Society for Information Display(SID)
CategoryAudio / Visual
NameH. Aga,
A. Ishihara,
K. Kawasaki,
M. Nishibe,
S. Kohara,
T. Ohara,
M. Fukuchi(Sony Corporation)
Details

The authors present the design and implementation of latency compensation techniques for an optical see-through head-mounted raster-scan display to realize augmented reality. The maximum registration error of 3D virtual objects results in 0.03 degrees on horizontal axis when the rolling motion of the user’s head is simulated.

more
ThemeA 0.68e-rms Random-Noise 121dB Dynamic-Range Sub-pixel architecture CMOS Image Sensor with LED Flicker Mitigation
Academic ConferenceIEEE International Electron Devices Meeting (IEDM)
Technology CategoryImaging / Sensing
NameS. Iida (Sony Semiconductor Solutions Corporation)
Details

This is a report of a CMOS image sensor with a sub-pixel architecture having a pixel pitch of 3 um. The aforementioned sensor achieves both ultra-low random noise of 0.68e-rms and high dynamic range of 121 dB in a single exposure, further realizing LED flicker mitigation.

more
ThemeBack-Illuminated 2.74 μm-Pixel-Pitch Global Shutter CMOS Image Sensor with Charge-Domain Memory Achieving 10k e- Saturation Signal
Academic ConferenceIEEE International Electron Devices Meeting (IEDM)
Technology CategoryImaging / Sensing
NameY. Kumagai (Sony Semiconductor Solutions Corporation)
Details

A 3208×2184 global shutter image sensor with back-illuminated architecture is implemented in a 90 nm/65 nm imaging process. The sensor, having 2.74 μm-pitch-pixels, achieves 10000 electrons full-well capacity and -80 dB parasitic light sensitivity. Furthermore, 13.8 e-/s dark current at 60°C and 1.85 erms random noise are obtained. In this paper, the structure of a pixel with memory along with saturation enhancement technology is described.

more
Theme4K HDR Workflow: from Capture to Display
Academic Conference2018 IEEE Broadcast Symposium (BTS)
Technology CategoryAudio / Visual
NameToshiyuki Ogura (Sony Visual Products
Pablo Espinosa (Sony Visual Products America)
Details

In the past several years, TV picture quality has improved with the introduction of 4K and wide color gamut. In addition, recently HDR (High Dynamic Range) has nurtured the picture performance evolution. Therefore TV images have become much more captivating. But, HDR seems difficult to understand correctly, since it contains a new concept, several different technologies, and standards from the status quo. In this paper, the meaning, benefit, and ecosystem of HDR will be explained. How HDR affects and what HDR requires of the system of TV/display will be explained for furthering one's understanding and utilization of HDR.

more
ThemeAdvantage of 10,000 cd/m2 with 8K Full-Spec HDR TV
Academic ConferenceIDW '18 Best Paper Award Winners
Commendation institutionsThe Institute of Image Information and Television Engineers
Technology CategoryAudio / Visual
NameToshiyuki Ogura (Sony Visual Products Inc.)
Details

It seems the resolution and the contrast of the display is independent, but actually very related. As for the advantage of the higher resolution like 8K, it is possible to see the picture nearby. But, the advantage of the combination with the higher luminance and the contrast is the higher performance of a picture quality and expression. This is the reason of why Sony demoed 8K Full-Spec HDR TV with 10,000cd/m2 peak luminance at CES2018. The advantages and the technologies of this demo will be explained.

more
ThemeWearable Motion Tolerant PPG Sensor for Instant Heart Rate in Daily Activity
Academic Conference10th International Joint Conference on Biomedical Engineering Systems and Technolotgies
Technology CategoryImmersive Experience
NameT. Ishikawa, Y. Hyodo, K. Miyashita, K. Yoshifuji, Y. Komoriya, Y. Imai(Sony Corporation)
Details

A wristband-type PPG heart rate sensor capable of overcoming motion artifacts in daily activity and detecting heart rate variability has been developed together with a motion artifact cancellation framework. In this work, a motion artifact model in daily life was derived and motion artifacts caused by activity of arm, finger, and wrist were cancelled significantly. Highly reliable instant heart rate detection with high noise-resistance was achieved from noise-reduced pulse signals based on peak-detection and autocorrelation methods. The wristband-type PPG heart rate sensor with our motion artifact cancellation framework was compared with ECG instant heart rate measurement in both laboratory and office environments. In a laboratory environment, mean reliability (percentage of time within 10% error relative to ECG instant heart rate) was 86.5% and the one-day pulse-accuracy achievement rate based on time use data of body motions in daily life was 88.1% or approximately 21 hours. Our dev ice and motion artifact cancellation framework enable continuous heart rate variability monitoring in daily life and could be applied to heart rate variability analysis and emotion recognition.

more
ThemeWatt-class 462 nm-Blue and 530 nm-Green Laser Diodes
Academic ConferenceInternational Conference on Nitride Semiconductor (ICNS)
CategoryAudio / Visual
NameM. Murayama,
Y. Nakayama,
K. Yamazaki,
Y. Hoshina,
H. Watanabe(Sony Corporation)
Details

In this research, watt‐class green and blue laser diodes, which are fabricated on free‐standing semipolar urn:x-wiley:14381656:media:pssa201700513:pssa201700513-math-0003 GaN and conventional c‐plane GaN substrates, respectively are developed. Although several research groups have recently developed green laser diodes on semipolar GaN substrates, which have weaker piezoelectric fields and higher indium homogeneity in InGaN active regions compared to c‐plane GaN, watt‐level output power has yet to be achieved. By utilizing the urn:x-wiley:14381656:media:pssa201700513:pssa201700513-math-0004 plane, the first watt‐class green lasers at 530 nm is successfully fabricated, and achieve maximum output powers in excess of 2 W, which to the best of our knowledge is the highest value reported for any GaN‐based green laser diode. A wall‐plug efficiency of 17.5% is realized at a current of 1.2 A under continuous‐wave operation, which corresponds to an optical output of approximately 1 W and is the highest value reported to date. In addition, high‐power and high‐efficiency blue laser diodes at 465 nm are successfully fabricated on conventional c‐plane GaN substrates. The output power and wall‐plug efficiency are 5.2 W and 37.0%, respectively, at a current of 3.0 A under continuous‐wave operation. These laser diodes are promising light sources meeting the ITU‐R Recommendation BT.2020 for future laser display applications.

more
ThemeNew Pixel Driving Circuit Using Self-discharging Compensation Method for High-Resolution OLED Micro Displays on a Silicon Backplane
Academic ConferenceThe Society for Information Display(SID)
CategoryAudio / Visual
NameK. Kimura,
Y. Onoyama,
T. Tanaka(Sony Corporation),
N. Toyomura,
H. Kitagawa(Sony Semiconductor Solutions Corporation)
Details

A new 4T2C pixel circuit formed on a silicon substrate is proposed to realize a high‐resolution 7.8‐μm pixel pitch AMOLED microdisplay. In order to achieve high luminance uniformity, the pixel circuit compensates its Vth variation of the MOSFET for the driving transistor internally by using self‐discharging method. Also presented are 0.5‐in Quad‐VGA and 1.25‐in wide Quad‐XGA microdisplays with the proposed pixel circuit.

more
ThemeHigh light extraction efficiency laser-phosphor light source
Academic ConferenceThe Society for Information Display(SID)
CategoryAudio / Visual
NameH.Morita,
Y. Maeda,
I.Kobayashi,
Y. Sato,
T. Nomura,
H.Kikuchi(Sony Corporation)
Details

We investigated the laser‐phosphor light source by using inorganic phosphor wheel. We experimentally confirmed the light extraction efficiency of the inorganic phosphor wheel which is 8% higher than conventional phosphor wheel. In addition, we explain about the cause of improvement of the efficiency by showing fluorescence emission model.

more
ThemeA Plastic Holographic Waveguide Combiner for Light-weight and Highly-transparent Augmented Reality Glasses
Academic ConferenceThe Society for Information Display(SID)
CategoryAudio / Visual
NameT. Yoshida,
K. Tokuyama,
Y. Takai,
D. Tsukuda,
T. Kaneko,
N. Suzuki(Sony Corporation),
T. Anzai(Sony Global Manufacturing & Operations Corporation),
A. Yoshikaie,
K. Akutsu,
A. Machida(Sony Corporation)
Details

There is a high demand for light‐weight, stylishly designed augmented reality (AR) glasses with natural see‐through capabilities for the wide‐spread distribution of novel wearable device to general consumers. We have successfully developed a unique production process of a holographic waveguide combiner that enables us to laminate holographic optical elements (HOEs) onto a plastic substrate with optical grade quality. The plastic substrate waveguide combiner has a number of advantages over conventional glass substrate combiners; the plastic substrate makes AR glasses lighter in weight and unbreakable. With the lamination process of HOEs, we can apply them to a various designs to satisfy general customers’ wide range of preferences for the style. We also potentially made it possible for the holographic waveguide combiner to be produced in larger volumes at lower costs by using our novel roll‐to‐roll hologram recording and laminating process. In this paper, we present our approach of the plastic substrate HOE production process for AR glasses.

more
ThemeA Plastic Electrochromic Dimming Device for Augmented Reality Glasses
Academic ConferenceThe Society for Information Display(SID)
CategoryAudio / Visual
NameA. Machida,
K. Kadono,
Y. Ishii,
T. Kono,
H. Takanashi,
A. Nishiike(Sony Corporation),
H. Suzuki,
Y. Nakagawa,
K. Ando,
D. Kasahara,
A. Takeda(Sony Global Manufacturing & Operations Corporation),
K. Nomoto(Sony Corporation)
Details

We have developed an electrochromic dimming device on a plastic substrate with high transparency modulation from 70% to 10% and bending radius below 30 mm. It works more than endurance 10,000 cycles and high‐temperature‐humidity conditions. Combination of the device and AR glass enables the clear image visibility in various environments.

more
ThemeProjection and sensing technology of Xperia Touch
Academic ConferenceInternational Solid-State Circuits Conference (ISSCC)
CategoryAudio / Visual
NameK. Kaneda(Sony Corporation)

more
ThemeLateral optical confinement of GaN-base d VCSEL using an a tomically smooth monolithic curved mirror
Academic ConferenceScientific Reports, 8, 10350
CategoryAudio / Visual
NameT. Hamaguchi,
M. Tanaka,
J. Mitomo,
H. Nakajima,
M. Ito,
N. Kobayashi,
K. Fujii,
H. Watanabe,
S. Satou,
M. Ohara,
R. Koda,
H. Narui(Sony Corporation)
Details

We demonstrate the lateral optical confinement of GaN-based vertical-cavity surface-emitting lasers (GaN-VCSELs) with a cavity containing a curved mirror that is formed monolithically on a GaN wafer. The output wavelength of the devices is 441–455 nm. The threshold current is 40 mA (Jth = 141 kA/cm2) under pulsed current injection (Wp = 100 ns; duty = 0.2%) at room temperature. We confirm the lateral optical confinement by recording near-field images and investigating the dependence of threshold current on aperture size. The beam profile can be fitted with a Gaussian having a theoretical standard deviation of σ = 0.723 µm, which is significantly smaller than previously reported values for GaN-VCSELs with plane mirrors. Lateral optical confinement with this structure theoretically allows aperture miniaturization to the diffraction limit, resulting in threshold currents far lower than sub-milliamperes. The proposed structure enabled GaN-based VCSELs to be constructed with cavities as long as 28.3 µm, which greatly simplifies the fabrication process owing to longitudinal mode spacings of less than a few nanometers and should help the implementation of these devices in practice.

more
ThemeFeature Quantities of EEG to Characterize Human Internal States of Concentration and Relaxation
Academic Conference40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society
CategoryImmersive Experience
NameN. Sazuka,
Y. Komoriya,
T. Ezaki(Sony Corporation),
M. Uraguchi,
H. Ohira(Nagoya University)
Details

We propose novel feature quantities of electroencephalogram (EEG) to effectively detect internal states of concentration and relaxation in humans. An experiment using a wearable devise showed that time series feature quantities of depending on moving averages of EEG powers had higher effect sizes comparing to conventional feature quantities of average EEG power per bands of frequencies.

more
ThemeAtomic Diffusion Bonding for Optical Devices with High Optical Density
Academic ConferenceThe Electrochemical Society(AiMES)
CategoryDevice / Material
NameG. Yonezawa,
Y. Takahashi,
Y. Sato,
S. Abe,
M. Uomoto(Sony Corporation),
T. Shimatsu(Tohoku University)
Details

An inorganic bonding method providing 100% light transmittance at the bonded interface was proposed for fabricating deviceswith high optical density. First, we fabricated 5000 nm-thick SiO2 oxide underlayers on synthetic quartz glass wafers. After the film surfaces were polished to reduce surface roughness, thewafers with oxide underlayers were bonded using thin Ti films in vacuum at room temperature as a usual atomic diffusion process.After post annealing at 300 °C, 100% light transmittance at the bonded interface with the surface free energy at the bondedinterface greater than 2 J/m2 was achieved. Dissociated oxygen from oxide layers probably enhanced Ti films oxidation, resulting in high light transmittancewith high bonding strength attributable to the annealing. Using this bonding process, we fabricated a polarizing beam splitterand demonstrated that this bonding process is useful to fabricate devices with high optical density.

more
ThemeImpact of molecular orientation on energy level alignment at C60/pentacene interfaces
Academic ConferenceApplied Physics Letters, 113, 163302
CategoryDevice / Material
NameT. Nishi,
M. Kanno,
M. Kuribayashi,
Y. Nishida,
S. Hattori,
H. Kobayashi(Sony Corporation),
F. von Wrochem,
V. Rodin,
G. Nelles(Sony Europe Limited, Materials Science Laboratory),
S. Tomiya(Sony Corporation)
Details

The molecular orientation and the electronic structure at molecular donor/acceptor interfaces play an important role in the performance of organic optoelectronic devices. Here, we show that graphene substrates can be used as templates for tuning the molecular orientation of pentacene (PEN), selectively driving the formation of either face-on or edge-on arrangements by controlling the temperature of the substrate during deposition. The electronic structure and morphology of the two resulting C60/PEN heterointerfaces were elucidated using ultraviolet photoelectron spectroscopy and atomic force microscopy, respectively. While the C60/PEN (edge-on) interface exhibited a vacuum level alignment, the C60/PEN (face-on) interface exhibited a vacuum level shift of 0.2 eV, which was attributed to the formation of an interface dipole that resulted from polarization at the C60/PEN boundary.

more
ThemeGFDM with Different Subcarrier Bandwidths
Academic ConferenceIEEE Vehicular Technology Conference(VTC)
Category5G / IoT
NameY. Akai,
Y. Enjoji,
Y. Sanada(Keio Univercity),
R. Kimura,
R. Sawai(Sony Corporation)
Details

This paper proposes a generalized frequency division multiplexing (GFDM) modulation scheme that transmits a signal with different subcarrier bandwidths. In a receiver, the GFDM signal is demodulated by using a zero forcing (ZF) algorithm or a minimum mean square error (MMSE) algorithm and the BER performance of these algorithms is related to the condition number of a modulation matrix. This matrix can be optimized by adjusting the roll-off factor of subcarrier filters. It is shown that the performance of the proposed GFDM is about 0.02 dB better than that with a roll-off factor of 0 at a BER of 10-3 on an AWGN channel. On the other hand, on the multipath fading channels, the BER performance improves as the subcarrier bandwidth increases because of frequency diversity.

more
ThemeA Singularity-free GFDM Modulation scheme with Parametric Shaping Filter Sampling
Academic ConferenceIEEE Vehicular Technology Conference(VTC)
Category5G / IoT
NameA. Yoshizawa,
R. Kimura,
R. Sawai(Sony Corporation)
Details

A GFDM modulation scheme that circumvents the singularity issue of the GFDM transformation matrix is presented. The coefficients used for the pulse shaping filter are derived from the prototype filter depending on the parity of the subsymbols. The proposed pulse shaping filter design makes it possible to have a non-singular transformation matrix for the arbitrary number of subsymbols and/or subcarriers in the sparse frequency-domain GFDM modulation.

more
ThemeScene depth profiling using Helmholtz Stereopsis
Academic ConferenceEuropean Conference on Computer Vision(ECCV)
CategoryImmersive Experience
NameH. Mori,
R. Koehle,
M. Kamm(Sony Europe Limited)
Details

Helmholtz stereopsis is a 3D reconstruction technique, capturing surface depth independent of the reflection properties of the material by using Helmholtz reciprocity. In this paper we are interested in studying the applicability of Helmholtz stereopsis for surface and depth profiling of objects and general scenes in the context of perspective stereo imaging. Helmholtz stereopsis captures a pair of reciprocal images by exchanging the position of light source and camera. The resulting image pair relates the image intensities and scene depth profile by a partial differential equation. The solution of this differential equation depends on the boundary conditions provided by the scene. We propose to limit the illumination angle of the light source, such that only mutually visible parts are imaged, resulting in stable boundary conditions. By simulation and experiment we show that a unique depth profile can be recovered for a large class of scenes including multiple occluding objects.

more
ThemeAutomatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Network
Academic ConferenceInternational Speech Communication Association(Interspeech)
CategoryAI / Robotics
NameN. Takahashi(Sony Corporation),
T. Naghibi,
B. Pfister,
L. V. Gool(ETH Zurich)
Details

Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation estimation and acoustic modeling method which only takes the orthographic transcription to jointly estimate a set of sub-word units and a reliable dictionary. Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.

more
ThemeDeep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection
Academic ConferenceInternational Speech Communication Association(Interspeech)
CategoryAI / Robotics
NameN. Takahashi(Sony Corporation),
M. Gygli,
B. Pfister,
L. V. Gool(ETH Zurich)
Details

We propose a novel method for Acoustic Event Detection (AED). In contrast to speech, sounds coming from acoustic events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of a clear sub-word unit. In order to incorporate the long-time frequency structure for AED, we introduce a convolutional neural network (CNN) with a large input field. In contrast to previous works, this enables to train audio event detection end-to-end. Our architecture is inspired by the success of VGGNet and uses small, 3×3 convolutions, but more depth than previous methods in AED. In order to prevent over-fitting and to take full advantage of the modeling capabilities of our network, we further propose a novel data augmentation method to introduce data variation. Experimental results show that our CNN significantly outperforms state of the art methods including Bag of Audio Words (BoAW) and classical CNNs, achieving a 16% absolute improvement.

more
ThemeDynamic Sensitivity Control based on Two-Hop Farthest Terminal in Dense WLAN
Academic ConferenceIEEE Advanced Information Networking and Applications (AINA)
Category5G / IoT
NameT. Ohnuma,
H. Shigeno (Keio Univercity),
T. Yamaura,
Y. Tanaka(Sony Corporation)
Details

The explosive usage of IEEE 802.11 Wireless Local Area Network (WLAN) has resulted in its dense deployments and excessive interference between Basic Service Sets (BSSs) in urban area such as an apartment building and an airport. Serious problems of hidden/exposed terminal in high-density condition negatively impact system throughput. To improve the system efficiency, IEEE 802.11ax TG has been assembled. TG aims at realizing High-Efficiency- WLAN (HEW) by utilizing special reuse technologies including Dynamic Sensitivity Control (DSC), Transmit Power Control (TPC), and BSS Color Filtering (BCF). In this paper, we propose a DSC based on two-hop farthest terminal for dense WLAN. This scheme with minimum transmission power resolves the hidden terminal problem. Propagation loss of received signal from associated communication pair is used for the proper values of transmission power and carrier sense level. Furthermore, adjusting these parameters destination by destination can reduce exposed terminals effectively. We evaluate the performance of the proposed scheme in residential building scenario with three criteria, aggregate throughput, fairness and frame error rate. Simulation results show that the proposed scheme can improve aggregate downlink throughput and fairness compared to previously proposed method that carrier sense level is set based on expected RSSI level of received packet from communicating pair. Furthermore, improvement of frame loss rate implies that the hidden terminal problem can be solved by the proposed scheme.

more
ThemeMultichannel blind source separation based on non-negative tensor factorization in wavenumber domain
Academic ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
CategoryAudio / Visual
NameY. Mitsufuji(Sony Corporation),
S. Koyama,
H. Saruwatari(The University of Tokyo)
Details

Multichannel non-negative matrix factorization based on a spatial covariance model is one of the most promising techniques for blind source separation. However, this approach is not tractable for a large number of microphones, M, because the computational cost is of order O(M3) per time-frequency bin. To circumvent this drawback, we propose non-negative tensor factorization in the wavenumber domain, which reduces the cost to the order O(M). It transforms microphone signals into the spatial frequency domain, a technique that is commonly used for soundfield reconstruction. The proposed method is compared to several blind source separation (BSS) methods in terms of separation quality and computational cost.

more
ThemeLatent Model Ensemble with Auto-Localization
Academic ConferenceInternational Conference on Pattern Recognition(ICPR)
CategoryAI / Robotics
NameM. Sun,
T. X. Han(University of Missouri),
X. Xu,
M-C Liu,
A. K-Rostamabad(Sony Electronics Inc)
Details

Deep Convolutional Neural Networks (CNN) have exhibited superior performance in many visual recognition tasks including image classification, object detection, and scene label- ing, due to their large learning capacity and resistance to overfit. For the image classification task, most of the current deep CNN- based approaches take the whole size-normalized image as input and have achieved quite promising results. Compared with the previously dominating approaches based on feature extraction, pooling, and classification, the deep CNN-based approaches mainly rely on the learning capability of deep CNN to achieve superior results: the burden of minimizing intra-class variation while maximizing inter-class difference is entirely dependent on the implicit feature learning component of deep CNN; we rely upon the implicitly learned filters and pooling component to select the discriminative regions, which correspond to the activated neurons. However, if the irrelevant regions constitute a large portion of the image of interest, the classification performance of the deep CNN, which takes the whole image as input, can be heavily affected. To solve this issue, we propose a novel latent CNN framework, which treats the most discriminate region as a latent variable. We can jointly learn the global CNN with the latent CNN to avoid the aforementioned big irrelevant region issue, and our experimental results show the evident advantage of the proposed latent CNN over traditional deep CNN: latent CNN outperforms the state-of-the-art performance of deep CNN on standard benchmark datasets including the CIFAR-10, CIFAR- 100, MNIST and PASCAL VOC 2007 Classification dataset.

more
ThemeAggregate Interference Prediction Based on Back-Propagation Neural Network
Academic ConferenceIEEE Dynamic Spectrum Access Networks(DySPAN)
CategoryAI / Robotics
NameY. Zhao,
L. Shi (Beijing Jiaotong University),
X. Guo,
C. Sun (SCRL)
Details

In dynamic spectrum access (DSA) scenarios, dense and complex deployment (e.g., in nonuniform or unknown radio propagation environment) of secondary systems (SSs) will make aggregate interference estimation highly complicated or challenging for reliable primary system (PS) protection. To tackle this problem, a back-propagation (BP) neural network based aggregate interference prediction method is proposed and evaluated via simulations. This paper also gives design guidelines of BP neural network appropriate for aggregate interference prediction via revealing the impact of several key factors on the prediction accuracy, such as the number of input parameters to the neural network, the coordinate system in use, and the number of hidden neurons.

more
ThemeMultimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Academic ConferenceEmpirical Methods on Natural Language Processing(EMNLP)
CategoryAI / Robotics
NameA. Fukui(Sony Corporation/University of California, Berkeley),
D. H. Park,
D. Yang(University of California, Berkeley),
A. Rohrbach(University of California,
Berkeley/Max Planck Instisute of Technology),
T. Darrell,
M. Rohrbach(University of California, Berkeley)
Details

Modeling textual or visual information with vector representations trained from large language or visual datasets has been successfully explored in recent years. However, tasks such as visual question answering require combining these vector representations with each other. Approaches to multimodal pooling include element-wise product or sum, as well as concatenation of the visual and textual representations. We hypothesize that these methods are not as expressive as an outer product of the visual and textual vectors. As the outer product is typically infeasible due to its high dimensionality, we instead propose utilizing Multimodal Compact Bilinear pooling (MCB) to efficiently and expressively combine multimodal features. We extensively evaluate MCB on the visual question answering and grounding tasks. We consistently show the benefit of MCB over ablations without MCB. For visual question answering, we present an architecture which uses MCB twice, once for predicting attention over spatial features and again to combine the attended representation with the question representation. This model outperforms the state-of-the-art on the Visual7W dataset and the VQA challenge.

more
ThemeDomain Adaptation for Neural Networks by Parameter Augmentation
Academic ConferenceAssociation for Computational Linguistics(ACL)
CategoryAI / Robotics
NameY. Watanabe(Sony Corporation),
K. Hashimoto,
Y. Tsuruoka(The Univercity of Tokyo)
Details

We propose a simple domain adaptation method for neural networks in a supervised setting. Supervised domain adaptation is a way of improving the generalization performance on the target domain by using the source domain dataset, assuming that both of the datasets are labeled. Recently, recurrent neural networks have been shown to be successful on a variety of NLP tasks such as caption generation; however, the existing domain adaptation techniques are limited to (1) tune the model parameters by the target dataset after the training by the source dataset, or (2) design the network to have dual output, one for the source domain and the other for the target domain. Reformulating the idea of the domain adaptation technique proposed by Daume (2007), we propose a simple domain adaptation method, which can be applied to neural networks trained with a cross-entropy loss. On captioning datasets, we show performance improvements over other domain adaptation methods.

more
ThemeOne-shot Learning Gesture Recognition Based on Evolution of Discrimination with Successive Memory
Academic ConferenceInternational Conference of Intelligent Robotic and Control Engineering (IRCE)
CategoryAI / Robotics
NameX. Li,
S. Qin (BUAA School of Automation Science and Electrical Engineering)
Kuanhong Xu,
Zhongying Hu (SCRL)
Details

In this paper, a one-shot learning gesture recognition algorithm based on evolution of discrimination with successive memory is presented, which utilizes the transferability of large-scale pre-trained DNN (Deep Neural Network) gesture recognition model and distance discrimination to carry out high-performance recognition with evolutionary discrimination. Our scheme can be narrated as follows. Firstly, a DNN gesture recognition model is proactively trained by a sample set with 19 classes of BSG dataset as a transferable model with its powerful extractor of features. Secondly, the transferable capacity of extractor is employed to extract features of labeled root samples and test samples respectively towards one-shot learning gesture recognition so as to achieve a high performance feature extraction and structured arraying. Finally, the discriminative recognition can be carried out with Euclidean distance measure between the root features and test features. Meanwhile a mechanism of updating and evolution of root features memory is built and utilized for one-shot learning gesture recognition so as to enhance the performance of recognition. A kind of software for online one-shot learning gesture recognition towards practical applications is designed and developed to achieve outstanding performance with fast response speed and high recognition accuracy. A series of experiments on the additional 10 classes of BSG dataset are conducted to verify and validate the performance advantages of our proposed one-shot learning gesture recognition algorithm.

more
ThemeComparison between Force-controlled Skin Deformation Feedback and Hand-Grounded Kinesthetic Force Feedback for Sensory Substitution
Academic ConferenceIEEE Robotics and Automation Letters (RA-L) with ICRA2018 option
CategoryAI / Robotics
NameY. Kamikawa(Sony Corporation)
Details

Teleoperation and virtual reality systems benefit from force sensory substitution when kinesthetic force feedback devices are infeasible due to stability or workspace limitations. We compared the performance of sensory substitution when it is provided through a cutaneous method (skin deformation feedback) and a kinesthetic method (hand-grounded force feedback). For skin deformation feedback, we used a new force-controlled tactile sensory substitution device with the ability to provide tangential and normal force directly to the finger pad. Three-axis force control with 15 Hz bandwidth was achieved using a delta mechanism and three-axis force sensor. For hand-grounded force feedback, forces were grounded against the palm. As a control, world-grounded force feedback was provided using a three-degree-of-freedom kinesthetic force feedback device. Study participants were able to match a reference world-grounded force better with hand-grounded kinesthetic force feedback than with skin deformation feedback. Participants were also able to apply more accurate and precise forces with hand-grounded kinesthetic force feedback than with skin deformation feedback. Conversely, skin deformation feedback resulted in the lowest error during initial force adjustment. These experiments demonstrate relative advantages and disadvantages of skin deformation and hand-grounded kinesthetic force feedback for force sensory substitution.

more
ThemeLatency and Refresh Rate on Force Perception via Sensory Substitution by Force-Controlled Skin Deformation Feedback 
Academic ConferenceIEEE International Conference on Robotics and Automation(ICRA)
CategoryAI / Robotics
NameZ. A. Zook,
A. M. Okamura(Stanford University),
Y. Kamikawa(Sony Corporation)
Details

Latency and refresh rate are known to adversely affect human force perception in bilateral teleoperators and virtual environments using kinesthetic force feedback, motivating the use of sensory substitution of force. The purpose of this study is to quantify the effects of latency and refresh rate on force perception using sensory substitution by skin deformation feedback. A force-controlled skin deformation feedback device was attached to a 3-degree-of-freedom kinesthetic force feedback device used for position tracking and gravity support. A human participant study was conducted to determine the effects of latency and refresh rate on perceived stiffness and damping with skin deformation feedback. Participants compared two virtual objects: a comparison object with stiffness or damping that could be tuned by the participant, and a reference object with either added latency or reduced refresh rate. Participants modified the stiffness or damping of the tunable object until it resembled the stiffness or damping of the reference object. We found that added latency and reduced refresh rate both increased perceived stiffness but had no effect on perceived damping. Specifically, participants felt significantly different stiffness when the latency exceeded 300 ms and the refresh rate dropped below 16.6 Hz. The impact of latency and refresh rate on force perception via skin deformation feedback was significantly less than what has been previously shown for kinesthetic force feedback.

more
ThemeMagnified Force Sensory Substitution for Telemanipulation via Force-Controlled Skin Deformation
Academic ConferenceIEEE International Conference on Robotics and Automation(ICRA)
CategoryAI / Robotics
NameY. Kamikawa(Sony Corporation)
Details

Teleoperation systems could benefit from force sensory substitution when kinesthetic force feedback systems are too bulky or expensive, and when they cause instability by magnifying force feedback. We aim to magnify force feedback using sensory substitution via force-controlled tactile skin deformation, using a device with the ability to provide tangential and normal force directly to the fingerpads. The sensory substitution device is able to provide skin deformation force feedback over ten times the maximum stable kinesthetic force feedback on a da Vinci Research Kit teleoperation system. We evaluated the effect of this force magnification in two experimental tasks where the goal was to minimize interaction force with the environment. In a peg transfer task, magnified force feedback using sensory substitution improved participants’ performance for force magnifications up to ten times, but decreased performance for higher force magnifications. In a tube connection task, sensory substitution that doubled the force feedback maximized performance; there was no improvement at the larger magnifications. These experiments demonstrate that magnified force feedback using sensory substitution via force-controlled skin deformation feedback can decrease applied forces similarly to magnified kinesthetic force feedback during teleoperation.

more
ThemeAttention-based Convolutional Neural Networks for Sentence Classification
Academic ConferenceInternational Speech Communication Association(Interspeech)
CategoryAI / Robotics
NameZ. Zhao,
Y. Wu(Sony(China)Limited)
Details

Sentence classification is one of the foundational tasks in spoken language understanding (SLU) and natural language processing(NLP). In this paper we propose a novel convolutional neural network (CNN) with attention mechanism to improve the performance of sentence classification. In traditional CNN, it is not easy to encode long term contextual information and correlation between non-consecutive words effectively. In contrast, our attention-based CNN is able to capture these kinds of information for each word without any external features. We conducted experiments on various public and inhouse datasets. The experimental results demonstrate that our proposed model significantly outperforms the traditional CNN model and achieves competitive performance with the ones that exploit rich syntactic features.

more
ThemeAffinity CNN: Learning Pixel-Centric Pairwise Relations for Figure/Ground Embedding
Academic ConferenceIEEE Computer Vision and Pattern Recognition(CVPR)
CategoryAI / Robotics
NameM. Maire(Toyota Technological Institute at Chicago),
T. Narihira(Sony/University of California, Berkeley),
S. X. Yu(University of California, Berkeley)
Details

Spectral embedding provides a framework for solving perceptual organization problems, including image segmentation and figure/ground organization. From an affinity matrix describing pairwise relationships between pixels, it clusters pixels into regions, and, using a complex-valued extension, orders pixels according to layer. We train a convolutional neural network (CNN) to directly predict the pairwise relationships that define this affinity matrix. Spectral embedding then resolves these predictions into a globally-consistent segmentation and figure/ground organization of the scene. Experiments demonstrate significant benefit to this direct coupling compared to prior works which use explicit intermediate stages, such as edge detection, on the pathway from image to affinities. Our results suggest spectral embedding as a powerful alternative to the conditional random field (CRF)-based globalization schemes typically coupled to deep neural networks.

more
ThemeModeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model
Academic ConferenceAssociation for the Advancement of Artificial Intelligence(AAAI)
CategoryAI / Robotics
NameR. Nakahashi(Sony Corporation/Masachusetts Institute of Technology),
C. L. Baker,
J. B. Tenenbaum(Masachusetts Institute of Technology)
Details

Most human behaviors consist of multiple parts, steps, or subtasks. These structures guide our action planning and execution, but when we observe others, the latent structure of their actions is typically unobservable, and must be inferred in order to learn new skills by demonstration, or to assist others in completing their tasks. For example, an assistant who has learned the subgoal structure of a colleague’s task can more rapidly recognize and support their actions as they unfold. Here we model how humans infer subgoals from observations of complex action sequences using a nonparametric Bayesian model, which assumes that observed actions are generated by approximately rational planning over unknown subgoal sequences. We test this model with a behavioral experiment in which humans observed different series of goal-directed actions, and inferred both the number and composition of the subgoal sequences associated with each goal. The Bayesian model predicts human subgoal inferences with high accuracy, and significantly better than several alternative models and straightforward heuristics. Motivated by this result, we simulate how learning and inference of subgoals can improve performance in an artificial user assistance task. The Bayesian model learns the correct subgoals from fewer observations, and better assists users by more rapidly and accurately inferring the goal of their actions than alternative approaches.

more
ThemeLow Complexity Beamforming Training Method for mmWave Communications
Academic ConferenceIEEE Signal Processing Advances in Wireless Communications(SPAWC)
Category5G / IoT
NameF. Fellhauer(Sony Europe Limited/University of Stuttgart),
N. Loghin,
D. Ciochina,
T. Handte(Sony Europe Limited),
S. ten Brink(University of Stuttgart)
Details

This paper introduces a low complexity method for antenna sector selection in mmWave Hybrid MIMO communication systems like the IEEE 802.11ay amendment for Wireless LANs. The method is backwards compatible to the methods already defined for the released mmWave standard IEEE 802.11ad. We introduce an extension of the 802.11ad channel model to support common Hybrid MIMO configurations. The proposed method is evaluated and compared to the theoretical limit of transmission rates found by exhaustive search. In contrast to state-of-the-art solutions, the presented method requires sparse channel information only. Numerical results show a significant complexity reduction in terms of number of necessary trainings, while approaching maximum achievable rate.

more
ThemeTerrestrial broadcast system using preamble and frequency division multiplexing
Academic ConferenceIEEE Broadband Multimedia Systems and Broadcasting(BMSB)
CategoryProfessional Solutions
NameL. Michael,
K. Takahashi,
Y. Shinohara,
L. Sakai,
M. Kan(Sony Corporation),
S. Atungsiri(Sony Europe Limited)
Details

Broadcast systems based on FDM (Frequency Division Multiplex) have the advantage of near continuous demodulation of the broadcast signal, allowing accurate and continuous tracking of channel conditions which is particularly useful for mobile reception. This has been employed in the ISDB-T standard used in Japan, Brazil and other countries. However, as designed in ISDB-T the broadcast signal lacks the ability to send system parameters such as FFT size, GI size and so on before the receiver begins demodulation. The receiver must blindly estimate such system parameters before it can read the other detailed parameter information using the TMCC pilot carriers. This takes time, usually one frame or longer. This paper proposes a next generation FDM system which enables the original advantages of FDM to be retained, while allowing additional advantages by employing an additional small signal (Preamble 1) which imparts essential information such as FFT size, GI size and pilot pattern to the receiver to enable immediate demodulation of the broadcast signal based on known parameters rather than blind estimation. Following demodulation of the first preamble, demodulation of the second preamble (Preamble 2) allows immediate knowledge of the all subsequent parameters contributing to faster demodulation of the overall signal.

more
ThemeAENet: Learning Deep Audio Features for Video Analysis
Academic ConferenceIEEE Transactions on Multimedia
CategoryAI / Robotics
NameN. Takahashi(Sony Corporation),
M. Gygli,
L. Van Gool(ETH Zurich)
Details

We propose a new deep network for audio event recognition, called AENet. In contrast to speech, sounds coming from audio events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of clear subword units that are present in speech. In order to incorporate this long-time frequency structure of audio events, we introduce a convolutional neural network (CNN) operating on a large temporal input. In contrast to previous works, this allows us to train an audio event detection system end to end. The combination of our network architecture and a novel data augmentation outperforms previous methods for audio event detection by 16%. Furthermore, we perform transfer learning and show that our model learned generic audio features, similar to the way CNNs learn generic features on vision tasks. In video analysis, combining visual features and traditional audio features, such as mel frequency cepstral coefficients, typically only leads to marginal improvements. Instead, combining visual features with our AENet features, which can be computed efficiently on a GPU, leads to significant performance improvements on action recognition and video highlight detection. In video highlight detection, our audio features improve the performance by more than 8% over visual features alone.

more
ThemeImproving music source separation based on deep neural networks through data augmentation and network blending
Academic ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
CategoryAI / Robotics
NameS. Uhlich,
M. Porcu,
F. Giron,
M. Enenkl,
T. Kemp(Sony Europe Limited),
N. Takahashi,
Y. Mitsufuji(Sony Corporation)
Details

This paper deals with the separation of music into individual instrument tracks which is known to be a challenging problem. We describe two different deep neural network architectures for this task, a feed-forward and a recurrent one, and show that each of them yields themselves state-of-the art results on the SiSEC DSD100 dataset. For the recurrent network, we use data augmentation during training and show that even simple separation networks are prone to overfitting if no data augmentation is used. Furthermore, we propose a blending of both neural network systems where we linearly combine their raw outputs and then perform a multi-channel Wiener filter post-processing. This blending scheme yields the best results that have been reported to-date on the SiSEC DSD100 dataset.

more
ThemeSupervised monaural source separation based on autoencoders
Academic ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
CategoryAI / Robotics
Audio / Visual
NameK. Osako,
Y. Mitsufuji(Sony Corporation),
R. Singh,
B. Raj(Sony(China)Limited)
Details

In this paper, we propose a new supervised monaural source separation based on autoencoders. We employ the autoencoder for the dictionary training such that the nonlinear network can encode the target source with high expressiveness. The dictionary is trained by each target source without the mixture signal, which makes the system independent from the context where the dictionaries will be used. In separation process, the decoder portions of the trained autoencoders are used as dictionaries to find the activations in a iterative manner such that a summation of the decoder outputs approximates the original mixture. The results of the instruments source separation experiments revealed that the separation performance of the proposed method was superior to that of the NMF.

more
ThemeMulti-Scale Multi-Band DenseNets for Audio Source Separation
Academic ConferenceIEEE Workshop on Applications of Signal Processing to Audio and Acoustics(WASPAA)
CategoryAI / Robotics
Audio / Visual
NameN. Takahashi,
Y. Mitsufuji(Sony Corporation)
Details

This paper deals with the problem of audio source separation. To handle the complex and ill-posed nature of the problems of audio source separation, the current state-of-the-art approaches employ deep neural networks to obtain instrumental spectra from a mixture. In this study, we propose a novel network architecture that extends the recently developed densely connected convolutional network (DenseNet), which has shown excellent results on image classification tasks. To deal with the specific problem of audio source separation, an up-sampling layer, block skip connection and band-dedicated dense blocks are incorporated on top of DenseNet. The proposed approach takes advantage of long contextual information and outperforms state-of-the-art results on SiSEC 2016 competition by a large margin in terms of signal-to-distortion ratio. Moreover, the proposed architecture requires significantly fewer parameters and considerably less training time compared with other methods.

more
ThemeHierarchical Recurrent Neural Network for Story Segmentation
Academic ConferenceInternational Speech Communication Association(Interspeech)
CategoryAI / Robotics
Audio / Visual
NameE. Tsunoo(The University of Edinburgh/Sony Corporation),  
O. Klejch,
P. Bell,
S. Renals(The University of Edinburgh)
Details

A broadcast news stream consists of a number of stories and each story consists of several sentences. We capture this structure using a hierarchical model based on a word-level Recurrent Neural Network (RNN) sentence modeling layer and a sentence-level bidirectional Long Short-Term Memory (LSTM) topic modeling layer. First, the word-level RNN layer extracts a vector embedding the sentence information from the given transcribed lexical tokens of each sentence. These sentence embedding vectors are fed into a bidirectional LSTM that models the sentence and topic transitions. A topic posterior for each sentence is estimated discriminatively and a Hidden Markov model (HMM) follows to decode the story sequence and identify story boundaries. Experiments on the topic detection and tracking (TDT2) task indicate that the hierarchical RNN topic modeling achieves the best story segmentation performance with a higher F1-measure compared to conventional state-of-the-art methods. We also compare variations of our model to infer the optimal structure for the story segmentation task. Index Terms: spoken language processing, recurrent neural network, topic modeling, story segmentation

more
ThemeHierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features
Academic ConferenceIEEE Automatic Speech Recognition and Understanding(ASRU)
CategoryAI / Robotics
Audio / Visual
NameE. Tsunoo(The University of Edinburgh/Sony Corporation),  
O. Klejch,
P. Bell,
S. Renals(The University of Edinburgh)
Details

A broadcast news stream consists of a number of stories and it is an important task to find the boundaries of stories automatically in news analysis. We capture the topic structure using a hierarchical model based on a Recurrent Neural Network (RNN) sentence modeling layer and a bidirectional Long Short-Term Memory (LSTM) topic modeling layer, with a fusion of acoustic and lexical features. Both features are accumulated with RNNs and trained jointly within the model to be fused at the sentence level. We conduct experiments on the topic detection and tracking (TDT4) task comparing combinations of two modalities trained with limited amount of parallel data. Further we utilize additional sufficient text data for training to polish our model. Experimental results indicate that the hierarchical RNN topic modeling takes advantage of the fusion scheme, especially with additional text training data, with a higher F1-measure compared to conventional state-of-the-art methods.

more
ThemeNon-Line-of-Sight Positioning for Mmwave Communications
Academic ConferenceIEEE International Workshop on Signal Processing Advances in Wireless Communications(SPAWC)
Category5G / IoT
NameF. Fellhauer(University of Stuttgart-Sony EuTEC Contractor),
N. Loghin (EuTEC),
J. Lassen,
A. Jaber (University of Stuttgart, Students)
Details

Using information about the wireless communication channel is a well known approach to estimate a users position. So far it has been shown that such methods can provide positioning information in line-of-sight (LOS) situations by estimating channel properties like time of flight, direction of arrival, and direction of departure of a link between a single access point and station. In this paper we focus on mm Wave channels and propose a method that allows positioning in indoor scenarios even under non-line-of-sight conditions by exploiting the presence of scatterers. Further, we propose an approach to overcome the need for an angular reference which is usually required to perform measurements of direction of arrival/departure and, therefore, limits practical applications. We investigate the influence of noisy temporal and spatial measurements on achievable performance with and without presence of an angular reference. Results show that in presence of an angular reference, positioning with the proposed method is possible with an accuracy lower than 4 cm in 50 % of observations and decreases to 8 cm without an angular reference.

more
ThemeMode Domain Spatial Active Noise Control Using Sparse Signal Representation
Academic ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
CategoryAudio / Visual
NameY. Maeno,
Y. Mitsufuji,
T. D. Abhayapa(ANU)
Details

Active noise control (ANC) over a sizeable space requires a large number of reference and error microphones to satisfy the spatial Nyquist sampling criterion, which limits the feasibility of practical realization of such systems. This paper proposes a mode-domain feedforward ANC method to attenuate the noise field over a large space while reducing the number of microphones required. We adopt a sparse reference signal representation to precisely calculate the reference mode coefficients. The proposed system consists of circular reference and error microphone arrays, which capture the reference noise signal and residual error signal, respectively, and a circular loudspeaker array to drive the anti-noise signal. Experimental results indicate that above the spatial Nyquist frequency,our proposed method can perform well compared to a conventional methods. Moreover, the proposed method can even reduce the number of reference microphones while achieving better noise attenuation.

more
ThemeMMDenseLSTM: An Efficient Combination of Convolutional and Recurrent Neural Networks for Audio Source Separation
Academic ConferenceIEEE International Workshop on Acoustic Signal Enhancement (IWAENC)
CategoryAI / Robotics
Audio / Visual
NameN. Takahashi,
N. Goswami,
Y. Mitsufuji(Sony Corporation)
Details

Deep neural networks have become an indispensable technique for audio source separation (ASS). It was recently reported that a variant of CNN architecture called MMDenseNet was successfully employed to solve the ASS problem of estimating source amplitudes, and state-of-the-art results were obtained for DSD100 dataset. To further enhance MMDenseNet, here we propose a novel architecture that integrates long short-term memory (LSTM) in multiple scales with skip connections to efficiently model long-term structures within an audio context. The experimental results show that the proposed method outperforms MMDenseNet, LSTM and a blend of the two networks. The number of parameters and processing time of the proposed model are significantly less than those for simple blending. Furthermore, the proposed method yields better results than those obtained using ideal binary masks for a singing voice separation task.

more
ThemeImproving DNN-based Music Source Separation using Phase Features
Academic ConferenceThe 2018 Joint Workshop on Machine Learning for Music
CategoryAI / Robotics
Audio / Visual
NameJ. Muth(EPFL),
S. Uhlich,
F. Cardinaux,
Y. Mitsufuji(Sony Corporation)
Details

Music source separation with deep neural networks typically relies only on amplitude features. In this paper we show that additional phase features can improve the separation performance. Using the theoretical relationship between STFT phase and amplitude, we conjecture that derivatives of the phase are a good feature representation opposed to the raw phase. We verify this conjecture experimentally and propose a new DNN architecture which combines amplitude and phase. This joint approach achieves a better signal-to distortion ratio on the DSD100 dataset for all instruments compared to a network that uses only amplitude features. Especially, the bass instrument benefits from the phase information.

more
ThemeCreating a Highly-Realistic "Acoustic Vessel Odyssey" Using Sound Field Synthesis with 576 Loudspeakers
Academic ConferenceAES Conference on Spatial Reproduction -Aesthetic and Science-
CategoryAudio / Visual
NameY. Mitsufuji,
A. Tomura,
K. Ohkuri(Sony Corporation)
Details

“Acoustic Vessel Odyssey” is a sound installation realizing the future of music by using Sony’s spatial audio technology called Sound Field Synthesis (SFS). It enables creators to simulate popping, moving and partitioning of sounds in one space. At the “Lost In Music” event, where we demonstrated “Acoustic Vessel Odyssey”, the immersive experience provided by SFS technology was further enhanced by a new, specially designed loudspeaker array consisting of 576 loudspeakers. The content was choreographed by sound artist Evala and is accompanied by a light installation created by digital media artists Kimchi and Chips. In this paper, we present the details of the system architecture as well as technical requirements of “Acoustic Vessel Odyssey”.

more
ThemePhaseNet: Discretized Phase Modeling with Deep Neural Networks for Speech Enhancement and Audio Source Separation
Academic ConferenceInternational Speech Communication Association(Interspeech)
CategoryAI / Robotics
Audio / Visual
NameN. Takahashi,
P. Agrawal(IISc),
N. Goswami,
Y. Mitsufuji(Sony Corporation)
Details

Previous research on audio source separation based on deep neural networks (DNNs) mainly focuses on estimating the magnitude spectrum of target sources and typically, phase of the mixture signal is combined with the estimated magnitude spectra in an ad-hoc way. Although recovering target phase is assumed to be important for the improvement of separation quality, it can be difficult to handle the periodic nature of the phase with the regression approach. Unwrapping phase is one way to eliminate the phase discontinuity, however, it increases the range of value along with the times of unwrapping, making it difficult for DNNs to model. To overcome this difficulty, we propose to treat the phase estimation problem as a classification problem by discretizing phase values and assigning class indices to them. Experimental results show that our classificationbased approach 1) successfully recovers the phase of the target source in the discretized domain, 2) improves signal-todistortion ratio (SDR) over the regression-based approach in both speech enhancement task and music source separation (MSS) task, and 3) outperforms state-of-the-art MSS. Index Terms: phase modeling, quantized phase, deep neural networks

more
ThemeMode-Domain Spatial Active Noise Control Using Multiple Circular Arrays
Academic ConferenceIEEE International Workshop on Acoustic Signal Enhancement(IWAENC)
CategoryAudio / Visual
NameY. Maeno,
Y. Mitsufuji(Sony Corporation),
P. N. Samarasinghe,
T. D. Abhayapala(ANU)
Details

Noise control and attenuation over a sizable space requires uniformly distributed microphones and loudspeakers, which limits the system’s viability in practice. In this paper, we propose a mode-domain active noise control (ANC) system using a simple microphone and loudspeaker array structure. We introduce few circular microphone and loudspeaker arrays to first transform a sound field into circular expansion mode coefficients and then combine them to calculate 3D mode coefficients, which are then processed in an adaptive algorithm to attenuate an undesired noise field in 3D space. Experimental results indicate that our proposed method gives comparable noise attenuation performance compared to a conventional method, which uses an unfeasible array structure. Furthermore, the proposed method shows better noise attenuation performance than a conventional temporal frequency domain ANC system.

more
ThemeMicrophone Array Geometry for Two Dimensional Broadband Sound Field Recording
Academic ConferenceAudio Engineering Society International convention(AES)
CategoryAudio / Visual
NameW. Liao,
Y. Mitsufuji,
K. Osako,
K. Ohkuri(Sony Corporation)
Details

Sound field recording with arrays made of omnidirectional microphones suffers from an ill-conditioned problem due to the zero and small values of the spherical Bessel function. This article proposes a geometric design of a microphone array for broadband two dimensional (2D) sound field recording and reproduction. The design is parametric, with a layout having a discrete rotationally symmetric geometry composed of several geometrically similar subarrays. The actual parameters of the proposed layout can be determined for various acoustic situations to give optimized results. This design has the advantage that it simultaneously satisfies many important requirements of microphone arrays such as error robustness, operating bandwidth, and microphone unit efficiency.

more
ThemeContext-Aware Dialog Re-Ranking for Task-Oriented Dialog Systems
Academic ConferenceIEEE Spoken Language Technology(SLT)
CategoryAI / Robotics
NameJ. Ohmura(Sony Corporation),
M. Eskenazi(Carnegie Mellon University)
Details

Dialog response ranking is used to rank response candidates by considering their relation to the dialog history. Although researchers have addressed this concept for open-domain dialogs, little attention has been focused on task-oriented dialogs. Furthermore, no previous studies have analyzed whether response ranking can improve the performance of existing dialog systems in real human-computer dialogs with speech recognition errors. In this paper, we propose a context-aware dialog response re-ranking system. Our system reranks responses in two steps: (1) it calculates matching scores for each candidate response and the current dialog context; (2) it combines the matching scores and a probability distribution of the candidates from an existing dialog system for response re-ranking. By using neural word embedding-based models and handcrafted or logistic regression-based ensemble models, we have improved the performance of a recently proposed end-to-end task-oriented dialog system on real dialogs with speech recognition errors.

more
ThemeHigh-Brightness Solid-State Light Source for 4K Ultra-Short-Throw Projector
Academic ConferenceThe Society for Information Display(SID)
CategoryDevice / Material
NameY. Maeda(Sony Semiconductor Solutions Corporation)
Details

We have developed technologies for a high‐output light source consisting of blue laser diodes and a reflective phosphor wheel for next generation 4K Ultra‐Short‐Throw Projector, and have achieved a fluorescence output of 87 W. As far as we know, it is the highest fluorescence output for projectors. We adopted a newly developed phosphor cooling mechanism and an inorganic binder for high reliability of the phosphor wheel. Therefore, no deterioration in the phosphor wheel could be observed over a time period of 7,500 hours. In this paper, we report on these lightsource technologies for achieving high output and high reliability.

more
ThemeHigh-Luminance Monochromatic See-Through Eyewear Display with Volume Hologram
Academic ConferenceThe Society for Information Display(SID)
CategoryAudio / Visual
NameT. Oku(Sony Semiconductor Solutions Corporation)

more
ThemeImprovement of Light-Extraction Efficiency of a Laser-Phosphor Light Source
Academic ConferenceThe Society for Information Display(SID)
CategoryDevice / Material
NameH. Morita(Sony Semiconductor Solutions Corporation)
Details

We investigated the laser‐phosphor light source by using inorganic phosphor wheel. We experimentally confirmed the light extraction efficiency of the inorganic phosphor wheel which is 8% higher than conventional phosphor wheel. In addition, we explain about the cause of improvement of the efficiency by showing fluorescence emission model.

more
ThemeDistinguished Paper: New Pixel-Driving Circuit Using Self-Discharging Compensation Method for High-Resolution OLED Microdisplays on a Silicon Backplane
Academic ConferenceThe Society for Information Display(SID)
CategoryAudio / Visual
NameK. Kimura(Sony Semiconductor Solutions Corporation)
Details

A new 4T2C pixel circuit formed on a silicon substrate is proposed to realize a high‐resolution 7.8‐μm pixel pitch AMOLED microdisplay. In order to achieve high luminance uniformity, the pixel circuit compensates its Vth variation of the MOSFET for the driving transistor internally by using self‐discharging method. Also presented are 0.5‐in Quad‐VGA and 1.25‐in wide Quad‐XGA microdisplays with the proposed pixel circuit.

more
ThemeDistinguished Paper: 4032-ppi High-Resolution OLED Microdisplay
Academic ConferenceThe Society for Information Display(SID)
CategoryDevice / Material
NameT. Fujii(Sony Semiconductor Solutions Corporation)
Details

A 0.5 inch UXGA OLED microdisplay has been developed with 6.3μm pixel pitch. Not only 4032ppi high resolution, but high frame rate, low power consumption, wide viewing angle and high luminance have been achieved. This newly developed OLED microdisplay is suitable for Near‐to‐Eye display applications, especially electronic viewfinders.

more
ThemeFour-Directional Pixel-Wise Polarization CMOS Image Sensor Using Air-Gap Wire Grid on 2.5-μm Back-Illuminated Pixels
Academic ConferenceIEEE International Electron Devices Meeting(IEDM)
CategoryImaging / Sensing
NameT. Yamazaki(Sony Semiconductor Solutions Corporation)
Details

Polarization information is useful in highly functional imaging. This paper presents a four-directional pixel-wise polarization CMOS image sensor using an air-gap wire grid on 2.5-μm back-illuminated pixels. The fabricated air-gap wire grid polarizer achieved a transmittance of 63.3 % and an extinction ratio of 85 at 550 nm, outperforming conventional polarization sensors. The pixel-wise polarizers fabricated with the wafer process on back-illuminated image sensors exhibit good oblique-incidence characteristics, even with small polarization pixels of 2.5 μm. The proposed image sensor realizes mega-pixel various fusion-imaging applications, such as surface reflection reduction, highly accurate depth mapping, and condition-robust surveillance.

more
ThemeNovel Stacked CMOS Image Sensor with Advanced Cu2Cu Hybrid Bonding
Academic ConferenceIEEE International Electron Devices Meeting(IEDM)
CategoryImaging / Sensing
NameY. Kagawa(Sony Semiconductor Solutions Corporation)
Details

We have successfully mass-produced novel stacked back-illuminated CMOS image sensors (BI-CIS). In the new CIS, we introduced advanced Cu2Cu hybrid bonding that we had developed. The electrical test results showed that our highly robust Cu2Cu hybrid bonding achieved remarkable connectivity and reliability. The performance of image sensor was also investigated and our novel stacked BI-CIS showed favorable results.

more
ThemeNear-infrared Sensitivity Enhancement of a Back-illuminated Complementary Metal Oxide Semiconductor Image Sensor with a Pyramid Surface for Diffraction Structure
Academic ConferenceIEEE International Electron Devices Meeting(IEDM)
CategoryImaging / Sensing
NameI. Oshiyama(Sony Semiconductor Solutions Corporation)
Details

We demonstrated the near-infrared (NIR) sensitivity enhancement of back-illuminated complementary metal oxide semiconductor image sensors (BI-CIS) with a pyramid surface for diffraction (PSD) structures on crystalline silicon and deep trench isolation (DTI). The incident light diffracted on the PSD because of the strong diffraction within the substrate, resulting in a quantum efficiency of more than 30% at 850 nm. By using a special treatment process and DTI structures, without increasing the dark current, the amount of crosstalk to adjacent pixels was decreased, providing resolution equal to that of a flat structure. Testing of the prototype devices revealed that we succeeded in developing unique BI-CIS with high NIR sensitivity.

more
ThemeAn Experimental CMOS Photon Detector with 0.5e- RMS Temporal Noise and 15μm pitch Active Sensor Pixels
Academic ConferenceIEEE International Electron Devices Meeting(IEDM)
CategoryImaging / Sensing
NameT. Nishihara(Sony Semiconductor Solutions Corporation)
Details

This is the first reported non-electron-multiplying CMOS Image Sensor (CIS) photon-detector for replacing Photo Multiplier Tubes (PMT). 15jum pitch active sensor pixels with complete charge transfer and readout noise of 0.5 e-RMS are arrayed and their digital outputs are summed to detect micro light pulses. Successful proof of radiation counting is demonstrated.

more
ThemePixel/DRAM/logic 3-layer stacked CMOS image sensor technology
Academic ConferenceIEEE International Electron Devices Meeting(IEDM)
CategoryImaging / Sensing
NameH. Tsugawa(Sony Semiconductor Solutions Corporation)
Details

We developed a CMOS image sensor (CIS) chip, which is stacked pixel/DRAM/logic. In this CIS chip, three Si substrates are bonded together, and each substrate is electrically connected by two-stacked through-silica vias (TSVs) through the CIS or dynamic random access memory (DRAM). We obtained low resistance, low leakage current, and high reliability characteristics of these TSVs. Connecting metal with TSVs through DRAM can be used as low resistance wiring for a power supply. The Si substrate of the DRAM can be thinned to 3 pm, and its memory retention and operation characteristics are sufficient for specifications after thinning. With this stacked CIS chip, it is possible to achieve less rolling shutter distortion and produce super slow motion video.

more
ThemeAn 8.3M‐pixel 480fps Global‐Shutter CMOS Image Sensor with Gain‐Adaptive Column ADCs and 2‐on‐1 Stacked Device Structure
Academic ConferenceVLSI Symposia on Technology and Circuits(VLSI)
CategoryImaging / Sensing
NameY. Oike(Sony Semiconductor Solutions Corporation)
Details

A 4K2K 480 fps global-shutter CMOS image sensor has been developed with super 35 mm format. This sensor employs newly developed gain-adaptive column ADCs to attain a dark random noise of 140 μV rms for the full-scale readout of 923 mV. An on-chip online correction of the error between two switchable gains maintains the nonlinearity of output image within 0.18 %. The 16-channel output interfaces with 4.752 Gbps/ch are implemented in 2 diced logic chips stacked on a sensor chip with 38K micro bumps.

more
ThemeAccelerating the Sensing World through Imaging Evolution
Academic ConferenceVLSI Symposia on Technology and Circuits(VLSI)
CategoryImaging / Sensing
NameT. Nomoto(Sony Semiconductor Solutions Corporation)
Details

The evolution of CMOS image sensors (CIS) and the future prospect of a “Sensing” world utilizing advanced imaging technologies promise to improve our quality of life by sensing everything, everywhere, every time. Charge Coupled Device image sensors replaced video camera tubes, allowing the introduction of compact video cameras as consumer products. CIS now dominates the market for digital still cameras created by its predecessor and, with the advent of column-parallel ADCs and back-illuminated technologies, outperforms them. CIS’s achieve better signal to noise ratio, lower power consumption, and higher frame rate. Stacked CIS’s continue to enhance functionality and user experience in mobile devices, a market that currently comprises over several billion units per year. CIS imaging technologies promise to accelerate the progress of a sensing world by continuously improving sensitivity, extending detectable wave-lengths, and further improving depth resolution and temporal resolution.

more
Theme320x240 Back-Illuminated 10μm CAPD Pixels for High Speed Modulation Time-of-Flight CMOS Image Sensor
Academic ConferenceVLSI Symposia on Technology and Circuits(VLSI)
CategoryImaging / Sensing
NameY. Kato(Sony Semiconductor Solutions Corporation)
Details

A 320×240 back-illuminated Time-of-Flight CMOS image sensor with 10μm CAPD pixels has been developed. The back-illuminated (BI) pixel structure maximizes the fill factor, allows for flexible transistor position and makes the light path independent of the metal layer. In addition, the CAPD pixel, which is optimized for high speed modulation, results in 80% modulation contrast at 100MHz modulation frequency.

more
Theme224-ke Saturation Signal Global Shutter CMOS Image Sensor with In-Pixel Pinned Storage and Lateral Overflow Integration Capacitor
Academic ConferenceVLSI Symposia on Technology and Circuits(VLSI)
CategoryImaging / Sensing
NameY. Sakano(Sony Semiconductor Solutions Corporation)
Details

The required incorporation of an additional in-pixel retention node for global shutter complementary metal-oxide semiconductor (CMOS) image sensors means that achieving a large saturation signal presents a challenge. This paper reports a 3.875-μm pixel single exposure global shutter CMOS image sensor with an in-pixel pinned storage (PST) and a lateral-overflow integration capacitor (LOFIC), which extends the saturation signal to 224 ke, thereby enabling the saturation signal per unit area to reach 14.9 ke/μm. This pixel can assure a large saturation signal by using a LOFIC for accumulation without degrading the image quality under dark and low illuminance conditions owing to the PST.

more
ThemeA 4.1Mpix 280fps Stacked CMOS Image Sensor with Array-Parallel ADC Architecture for Region Control
Academic ConferenceVLSI Symposia on Technology and Circuits(VLSI)
CategoryImaging / Sensing
NameT. Takahashi(Sony Semiconductor Solutions Corporation)
Details

A 4.1Mpix 280fps stacked CMOS image sensor with array-parallel ADC architecture is developed for region control applications. The combination of an active reset scheme and frame correlated double sampling (CDS) operation cancels Vth variation of pixel amplifier transistors and kTC noise. The sensor utilizes a floating diffusion (FD) based back-illuminated (BI) global shutter (GS) pixel with 4.2e-rms readout noise. An intelligent sensor system with face detection and high resolution region-of-interest (ROI) output is demonstrated with significantly low data bandwidth and low ADC power dissipation by utilizing a flexible area access function.

more
Theme3D integration technology for CMOS image sensors and future prospects
Academic ConferenceVLSI Symposia on Technology and Circuits(VLSI)
CategoryImaging / Sensing
NameR. Nakamura(Sony Semiconductor Solutions Corporation)

more
ThemeA 0.7V 1.5-to-2.3mW GNSS Receiver with 2.5-to-3.8dB NF in 28nm FD-SOI
Academic ConferenceInternational Solid-State Circuits Conference(ISSCC)
CategoryDevice / Material
NameK. Yamamoto(Sony Semiconductor Solutions Corporation)
Details

We are approaching the age of IoE, in which wearable devices such as smart watches will be widespread. Sensing processors play a key role and the Global Navigation Satellite System (GNSS) is considered fundamental. Power consumption is one of the most important characteristics for such sensing processors. However, current GNSS receivers consume around 10mW [1,2] and are difficult to be embedded. GNSS receivers require high supply voltage for low-noise RF, which contributes to large power consumption. We developed 0.7V RF circuits that enable effective use of FD-SOI. Among the RF circuits, an LNA and an LPF are the key to 0.7V operation. We implemented an LNA with DC feedback using an OPAMP and an LPF that is composed of OTAs that have positive feedback as well as a mechanism for adjusting the output common-mode voltage.

more
ThemeA 12Gb/s 0.9mW/Gb/s Wide-Bandwidth Injection-Type CDR in 28nm CMOS with Reference-Free Frequency Capture
Academic ConferenceInternational Solid-State Circuits Conference(ISSCC)
CategoryDevice / Material
NameT. Masuda(Sony Semiconductor Solutions Corporation)
Details

The consumer electronics market demands high-speed and low-power serial data interfaces. The injection locked oscillator (ILO) based clock and data recovery (CDR) circuit [1-2], is a well-known solution for these demands. The typical solution has at least two oscillators: a master and one or more slaves. The master, a replica of the data path ILO, is part of a phase locked loop (PLL) used to correct the oscillator free-running frequency (FRF). The slave ILO phase locks to the incoming data but uses the frequency control from the master. Any FRF difference between the master and slave, such as that caused by PVT or mismatch, reduces the receiver performance. One solution to the reduced performance [3] uses burst data and corrects the FRF between bursts. However, for continuous data, injection forces the recovered clock frequency to match the incoming data rate, masking any FRF error from the frequency detector. Existing solutions [4-5] use a phase detector (PD) to measure the FRF. However, any static phase offset between the PD lock point and the ILO lock point causes the frequency control algorithm to converge incorrectly. Static phase offset can be caused by mismatch, PVT, or layout. This paper describes an ILO-type CDR, called the frequency-capturing ILO (FCILO), that eliminates the master oscillator and combines the ILO and PLL [6] type CDRs, realizing the benefit of both. The ILO gives wide bandwidth and fast locking while the PLL gives wide frequency capture range. The CDR architecture, shown in Fig 10.4.2, has a half-rate ILO, data and edge samplers making a bang-bang phase detector (BBPD), two 2:10 demuxes, and independent digital phase and frequency control. The ILO is made from current-starved inverters and driven by an edge detector. The ILO has coarse and fine frequency tuning. The strength of the unit inverter of the oscillator is adjusted for coarse tuning, keeping the normalized gain and delay constant over a wide range of frequencies. A current DAC is used for fine tuning. The edge detector shorts the ILO differential nodes together to align clock and data transitions. The BBPD outputs are used by the digital phase and frequency control to determine if ILO edges are early or late with respect to the incoming data and to correct the ILO FRF. A variable delay circuit controls the timing between data and clock inputs to the BBPD, correcting the static phase offset between the PD and ILO lock points.

more
ThemeA 1ms High-Speed Vision Chip with 3D-Stacked 140GOPS Column-Parallel PEs for Spatio-Temporal Image Processing
Academic ConferenceInternational Solid-State Circuits Conference(ISSCC)
CategoryImaging / Sensing
NameT. Yamazaki(Sony Semiconductor Solutions Corporation)
Details

High-speed vision systems that combine high-frame-rate imaging and highly parallel signal processing enable instantaneous visual feedback to rapidly control machines over human-visual-recognition speeds. Such systems also enable a reduction in circuit scale by using a fast and simple algorithm optimized for high-frame-rate processing (Sony Corporation). Previous studies on vision systems and chips [1-4] have yielded low imaging performance due to large matrix-based processing element (PE) parallelization [1-3], and low functionality of the limited-purpose column-parallel PE architecture [4], constraining vision-chip applications.

more
ThemeA 1/2.3in 20Mpixel 3-Layer Stacked CMOS Image Sensor with DRAM
Academic ConferenceInternational Solid-State Circuits Conference(ISSCC)
CategoryImaging / Sensing
NameT. Haruta(Sony Semiconductor Solutions Corporation)
Details

In recent years, the performance of cellphone cameras has improved, and is becoming comparable to that of SLR cameras. However, the big difference between cellphone cameras and SLR cameras is the distortion due to the rolling exposure of CMOS image sensors (CISs) because cellphone cameras cannot have a mechanical shutters (Sony Corporation). In addition to this technical problem, the demands for high quality in dark situations and for movies are increasing. Frame-level signal processing can solve these problems, but previous generations of CIS could not achieve both high-speed readout and accessible I/F speed. This paper presents 3-layer-stacked back-illuminated CMOS Image Sensor (3L-BI-CIS) with mounted DRAM as the frame memory.

more
ThemeA 1/4-inch 3.9Mpixel Low-Power Event-Driven Back-Illuminated Stacked CMOS Image Sensor
Academic ConferenceInternational Solid-State Circuits Conference(ISSCC)
CategoryImaging / Sensing
NameO. Kumagai(Sony Semiconductor Solutions Corporation)
Details

Wireless products such as smart home-security cameras, intelligent agents, and virtual personal assistants, are evolving rapidly to satisfy our needs. Small size, extended battery life, transparent machine interfaces: all these are required of the camera system in these applications. These applications, in battery-limited environments, can profit from an event-driven approach for moving-object detection. This paper presents a 1/4-inch 3.9Mpixel low-power event-driven (ED) back-illuminated stacked CMOS image sensor (CIS) deployed with a pixel readout circuit that detects moving objects for each pixel under lighting conditions ranging from 1 to 64,000lux. Utilizing pixel summation in a shared floating diffusion (FD) for each pixel block, moving object detection is realized at 10 frames per second while consuming only 1.1mW, a 99% reduction in power from the same CIS at a full-resolution 60fps power of 95mW.

more
ThemeA Back-Illuminated Global-Shutter CMOS Image Sensor with Pixel-Parallel 14b Subthreshold ADC
Academic ConferenceInternational Solid-State Circuits Conference(ISSCC)
CategoryImaging / Sensing
NameM. Sakakibara(Sony Semiconductor Solutions Corporation)
Details

Rolling-shutter CMOS image sensors (CISs) are widely used [1,2]. However, the distortion of moving subjects remains an unresolved problem, regardless of the speed at which these sensors are operated. It has been reported that by adopting in-pixel analog memory (MEM) in pixels, a global shutter (GS) can be achieved by saving all pixels simultaneously as stored charges [3,4]. However, as signals from a storage unit are read in a column-wise sequence, a light-shielding structure is required for the MEM to suppress the influence of parasitic light during the reading period. Pixel-parallel ADCs have been reported as methods of implementing GS on a circuit [5,6]. However, these techniques have not been successful in operations on megapixels because they do not address issues such as the timing constraint for reading and writing a digital signal to and from an ADC in a pixel owing to increase in the number of pixels and the increase in the total power consumption of massively parallel comparators (CMs).

more
ThemeCompressive Imaging for CMOS Image Sensors
Academic ConferenceInternational Solid-State Circuits Conference(ISSCC)
CategoryImaging / Sensing
NameY. Oike(Sony Semiconductor Solutions Corporation)

more
ThemeDevelopment of the High Efficiency Video Coding (HEVC) standard
Academic Conference2017 Primetime Emmy Engineering Award
Commendation institutionsThe Academy of Television Arts & Sciences
CategoryAudio / Visual
NameTeruhiko Suzuki
Details

The development of High Efficiency Video Coding (HEVC) has enabled efficient delivery in ultra-high-definition (UHD) content over multiple distribution channels. This new compression coding has been adopted, or selected for adoption, by all UHD television distribution channels, including terrestrial, satellite, cable, fiber and wireless, as well as all UHD viewing devices, including traditional televisions, tablets and mobile phones.

Sorry, your search did not return any results.

Page Top