ICASSP 2025
The 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025) is the IEEE Signal Processing Society’s flagship conference on signal processing and its applications. This event marks the 50th anniversary of ICASSP and will be the first time it is hosted in India. The conference theme is “Celebrating Signal Processing".
Sony is proud to be a Platinum Sponsor of ICASSP 2025. We look forward to this year's exciting sponsorship and exhibition opportunities, featuring a variety of ways to connect with participants in person.
Exhibition Booth - Open Hours
Visit our booth at the exhibition to explore our latest technology firsthand and engage with our team.
Location: #C4 (Hall 1-3, Ground Floor)
Sony Booth Open Hours:
- April 8th (Tue) 09:00 – 17:00 (IST)
- April 9th (Wed) 09:00 – 17:00 (IST)
- April 10th (Thu) 09:00 – 17:00 (IST)
- April 11th (Fri) 09:00 – 17:00 (IST)
Exhibition Booth - Technology Presentation
This schedule is subject to change. Please inquire at the Exhibition Booth for the latest schedule.
Emotion and Duration Controllability in Speech Generation Techniques
< Presentation >
Presenter : Ashish Gudmalwar
Date/Time :
- April 10th (Thu) 11:15 - 11:30 (IST)
- April 11th (Fri) 15:30 - 15:45 (IST)
< Link >
- Advancing AI for Voice Synthesis and Localization
- EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion (arxiv)
- DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing (arxiv)
High-Resolution Speech Restoration with Latent Diffusion Model
< Presentation >
Presenter : Tushar Dhyani
Date/Time :
- April 8th (Tue) 11:30 - 11:45 (IST)
- April 11th (Fri) 13:45 - 14:00 (IST)
< Link >
- High-Resolution Speech Restoration with Latent Diffusion Model (arxiv)
- High-Resolution Speech Restoration with Latent Diffusion Model - ICASSP 25 presentation (Youtube)
Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens
< Presentation >
Presenter : Yosuke Kashiwagi
Date/Time :
- April 9th (Wed) 10:00 - 10:15 (IST)
- April 11th (Fri) 11:15 - 11:30 (IST)
< Link >
- Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens (arxiv)
Improvising MSR for Indian Languages using large foundation models
< Presentation >
Presenter : Raj Gohil
Date/Time :
- April 9th (Wed) 11:15 - 11:30 (IST)
- April 11th (Fri) 10:00 - 10:15 (IST)
< Link >
- Enhancing Whisper's Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization (arxiv)
Deep Generative Models for Audio-Visual
< Presentation >
Presenter : Cong Bac Nguyen
Date/Time :
- April 8th (Tue) 15:30 - 15:45 (IST)
- April 10th (Thu) 15:30 - 15:45 (IST)
< Link >
- GitHub - sony/creativeai
Sony's Music Foundation Models
< Presentation >
Presenter : Shusuke Takahashi
Date/Time :
- April 9th (Wed) 15:30 - 15:45 (IST)
- April 10th (Thu) 10:00 - 10:15 (IST)
< Link >
- GitHub - sony/creativeai
- Music Foundation Model as Generic Booster for Music Downstream Tasks (arxiv)
- OpenMU: Your Swiss Army Knife for Music Understanding (arxiv)
- DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning (arxiv)
- Cross-Modal Learning for Music-to-Music-Video Description Generation (arxiv)
Stem Generation with Latent Diffusion (Diff-A-Riff, Latent Diffusion)
< Presentation >
Presenter : Stefan Lattner
Date/Time :
- April 8th (Tue) 15:15 - 15:30 (IST)
- April 9th (Wed) 15:15 - 15:30 (IST)
< Link >
- Improving Musical Accompaniment Co-creation via Diffusion Transformers (arxiv)
- Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding (arxiv)
- Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems (arxiv)
- Estimating Musical Surprisal in Audio (arxiv)
- Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures (arxiv)
- Hybrid Losses for Hierarchical Embedding Learning (arxiv)
Corporate Information
< Presentation >
Date/Time :
- April 8th (Tue) 13:45 - 14:00 (IST)
- April 9th (Wed) 13:45 - 14:00 (IST)
- April 11th (Fri) 13:30 - 13:45 (IST)
Sony's Technical Programs at ICASSP 2025
Tutorial
Transforming Chaos into Harmony: Diffusion Models in Audio Signal Processing
- Authors : Chieh-Hsin Lai (Sony AI), Koichi Saito (Sony AI), Bac Nguyen (Sony Europe B.V.), Yuki Mitsufuji (Sony AI), Stefano Ermon
- Date/Time : April 7th (Mon) | 14:00 - 17:30 (IST)
- Location : Hall 6
< Link >
- Transforming Chaos into Harmony: Diffusion Models in Audio Signal Processing (ICASSP 2025)
- Transforming Chaos into Harmony: Diffusion Models in Audio Signal Processing (Tutorial site)
Lecture
[50 years of Audio and Acoustic Signal Processing / AASP-SS6-L7] 30+ Years of Source Separation Research: Achievements and Future Challenges
- Authors : Shoko Araki, Nobutaka Ito, Reinhold Haeb-Umbach, Gordon Wichern, Zhong-Qiu Wang, Yuki Mitsufuji (Sony AI)
- Date/Time : April 11th (Fri) 14:00 - 15:30 (IST)
- Location : MR1.02
< Link >
- 30+ Years of Source Separation Research: Achievements and Future Challenges (arxiv)
[Music analysis II / AASP-L4] Estimating Musical Surprisal in Audio
- Authors : Mathias Rose Bjare, Giorgia Cantisani, Stefan Lattner (Sony Computer Science Laboratories), Gerhard Widmer
- Date/Time : April 9th (Wed) | 11:30 - 13:00 (IST)
- Location : MR1.02
< Link >
- Estimating Musical Surprisal in Audio (arxiv)
[Deep generative models I / MLSP-L9] Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding
- Authors : Marco Pasini, Stefan Lattner (Sony Computer Science Laboratories), George Fazekas
- Date/Time : April 9th (Wed) | 14:00 - 15:30 (IST)
- Location : MRG.02
< Link >
- Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding (arxiv)
[50 years of Audio and Acoustic Signal Processing / AASP-SS6-L7] Twenty-Five Years of MIR Research: Achievements, Practices, Evaluations, and Future Challenges
- Authors : Geoffroy Peeters, Zafar Rafii, Magdalena Fuentes, Zhiyao Duan, Emmanouil Benetos, Juhan Nam, Yuki Mitsufuji (Sony AI)
- Date/Time : April 11th (Fri) | 14:00 - 15:30 (IST)
- Location : MR1.02
< Link >
- Twenty-Five Years of MIR Research: Achievements, Practices, Evaluations, and Future Challenges (IEEE Xplore)
[Neural Audio Coding / AASP-L2] Variable Bitrate Residual Vector Quantization for Audio Coding
- Authors : Yunkee Chae (Sony AI/IPAI), Woosung Choi (Sony AI), Yuhta Takida (Sony AI), Junghyun Koo (Sony AI), Yukara Ikemiya (Sony AI), Zhi Zhong (Sony Group Corporation), Kin Wai Cheuk (Sony AI), Marco A. Martínez-Ramírez (Sony AI), Kyogu Lee, Wei-Hsiang Liao (Sony AI), Yuki Mitsufuji (Sony AI/Sony Group Corporation)
- Date/Time : April 8th (Wed) | 17:00 - 18:30 (IST)
- Location : MR1.02
< Link >
- VRVQ: Variable Bitrate Residual Vector Quantization for Audio Coding (arxiv)
Poster
[Music analysis I / AASP-P1] COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
- Authors : Ruben Ciranni, Giorgio Mariani, Michele Mancusi (Sony Europe B.V.), Emilian Postolache, Giorgio Fabbro (Sony Europe B.V.), Emanuele Rodolà, Luca Cosmo
- Date/Time : April 8th (Tue) | 14:00 - 15:30 (IST)
- Location : Poster 2A
< Link >
- COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations (arxiv)
[Multilingual Speech Processing / SLP-P43] Enhancing Whisper's Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization
- Authors : Kumud Tripathi (Sony Research India), Raj Gothi (Sony Research India), Pankaj Wasnik (Sony Research India)
- Date/Time : April 11th (Fri) | 08:30 - 10:00 (IST)
- Location : Poster 2E
< Link >
- Enhancing Whisper's Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization (arxiv)
[Application of Image and Video Processing VII / IVMSP-P39] Foundation Models Boost Low-Level Perceptual Similarity Metrics
- Authors : Abhijay Ghildyal, Nabajeet Barman (Sony Interactive Entertainment (PlayStation)), Saman Zadtootaghaj (Sony Interactive Entertainment (PlayStation))
- Date/Time : April 11th (Fri) | 14:00 - 15:30 (IST)
- Location : Poster 2H
< Link >
- Foundation Models Boost Low-Level Perceptual Similarity Metrics (IEEE Xplore)
[Speech Enhancement and Extraction III / SLP-P2] High-Resolution Speech Restoration with Latent Diffusion Model
- Authors : Tushar Dhyani (Sony Europe B.V/University of Stuttgart), Florian Lux, Michele Mancusi (Sony Europe B.V), Giorgio Fabbro (Sony Europe B.V), Fritz Hohl (Sony Europe B.V), Ngoc Thang Vu
- Date/Time : April 8th (Tue) | 14:00 - 15:30 (IST)
- Location : Poster 2D
< Link >
- High-Resolution Speech Restoration with Latent Diffusion Model (arxiv)
[Music analysis I / AASP-P1] Hybrid Losses for Hierarchical Embedding Learning
- Authors : Haokun Tian, Stefan Lattner (Sony Computer Science Laboratories), Brian McFee, Charalampos Saitis
- Date/Time : April 8th (Tue) | 14:00 - 15:30 (IST)
- Location : Poster 2A
< Link >
- Hybrid Losses for Hierarchical Embedding Learning (arxiv)
[Multi-Talker and Speaker-Informed ASR / SLP-P47] Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens
- Authors : Yosuke Kashiwagi (Sony Group Corporation), Hayato Futami (Sony Group Corporation), Emiru Tsunoo (Sony Group Corporation), Siddhant Arora, Shinji Watanabe
- Date/Time : April 11th (Fri) | 11:30 - 13:00 (IST)
- Location : Poster 2E
< Link >
- Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens (arxiv)
[Music signal processing, production and separation I / AASP-P13] Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer
- Authors : Michele Mancusi (Sony Europe B.V.), Yurii Halychanskyi, Kin Wai Cheuk (Sony AI), Eloi Moliner, Chieh-Hsin Lai (Sony AI), Stefan Uhlich (Sony Europe B.V.), Junghyun Koo (Sony AI), Marco A. Martínez-Ramírez (Sony AI), Wei-Hsiang Liao (Sony AI), Giorgio Fabbro (Sony Europe B.V.), Yuki Mitsufuji (Sony AI)
- Date/Time : April 10th (Thu) | 8:30 - 10:00 (IST)
- Location : Poster 2A
< Link >
- Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer (arxiv)
[Music analysis I / AASP-P1] Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures
- Authors : Alain Riou (Télécom-Paris/Sony Computer Science Laboratories), Antonin Gagneré, Gaëtan Hadjeres (Sony AI), Stefan Lattner (Sony Computer Science Laboratories), Geoffroy Peeters
- Date/Time : April 8th (Tue) | 14:00 - 15:30 (IST)
- Location : Poster 2A
< Link >
- Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures (arxiv)
Event Report
Sony had a strong presence at ICASSP 2025 in Hyderabad, India, where we introduced our latest research and technologies.
We were delighted to welcome many visitors to our booth and engage in meaningful conversations with researchers and professionals from around the world.
Our Technology Blog captures these moments along with details of our accepted papers.
Visit to learn more about our contributions:
https://www.sony.com/en/SonyInfo/technology/stories/entries/ICASSP2025_report/
Tech stories: Building technologies to expand the future of sound for creators
Yuki Mitsufuji, Distinguished Engineer at Sony Group, will share research in a tutorial session at ICASSP 2025 about applications in sound technology. We sat down with him to speak about his efforts to expand the value and possibilities in the field of audio, and the challenges he aims to tackle in the future.
https://www.sony.com/en/SonyInfo/technology/stories/entries/interview_de_mitsufuji/
Career Information
We look forward to working with highly motivated individuals to fill the world with emotion and to pioneer future innovation through dreams and curiosity. If interested, please access our career site and/or consider visiting the Sony Information Desk at Sony booth (Booth#: C4) at the ICASSP exhibition hall (Hall 1-3) to know more about Sony Group.
Career Site Link: https://www.sony.com/en/SonyInfo/Careers/
Sony Women in Technology Award with Nature
This annual award honors outstanding early to mid-career women researchers pioneering breakthroughs in science, technology, engineering, and mathematics.
Each year, Sony and Nature recognize three researchers with a prize of $250,000 USD each and a chance to showcase work on nature.com.
The application period for the 2026 award is now closed. Eligible ICASSP attendees are encouraged to apply next year. Join our newsletter to be the first to know when the next cycle opens.
Sony Group Technology Portal
You can explore our technology by clicking HERE.