Multi-Modal Robust Speech Processing, Analysis and Recognition in Reality

       Department of Systems Engineering and Engineering Management

                     The Chinese University of Hong Kong


Date: Thursday, August 3, 10:30 am

Venue: ERB 513, The Chinese University of Hong Kong

Title: Multi-Modal Robust Speech Processing, Analysis and Recognition
in Reality

Speaker: Prof. Yanmin Qian, Department of Computer Science and
Engineering, Shanghai Jiao Tong University

ZOOM Link:

ZOOM Meeting ID: 916 8838 5965


Although intelligent speech processing has been greatly advanced in research and widely used in many real-life applications, there still remains a large performance gap between controlled environments and real-life scenarios. Multi-modality research is one of the important strategies to boost the performance of speech processing system in reality, and has been the hot topic in both academia and industry. In this talk, we will summarize recent progress and present our efforts on multi-modal speech processing in the complex real scenario, especially on the new techniques developed in SJTU for multi-modal based speaker identification, speech separation and enhancement, speech recognition, scene analysis and pretraining model with self-supervised training.


Dr. Yanmin Qian is a Full Professor in Shanghai Jiao Tong University (SJTU), China. He received his PhD in the Department of Electronic Engineering from Tsinghua University, China in 2012. From 2013, he joined the Department of Computer Science and Engineering in Shanghai Jiao Tong University. From 2015 to 2016, he also worked as an Associate Research at the Speech Group in Cambridge University Engineering Department, UK. He was one of the key members to design and implement the Cambridge Multi-Genre Broadcast Speech Processing system, which won all four tasks of the first MGB Challenge in 2015. He is a senior member of IEEE and a member of ISCA, and one of the founding members of Kaldi Speech Recognition Toolkit. He has published more than 200 papers on speech and language processing with 11000+ citations, including T-ASLP, Speech Communication, ICASSP, INTERSPEECH and ASRU. His current research interests include the automatic speech recognition and translation, speaker and language recognition, speech separation and enhancement, natural language understanding, deep learning and multi-media signal processing. He was the recipient of several awards including the Best Paper Award in Speech Communication and Best Paper Award from IEEE ASRU in 2019. He is also the Member of IEEE Signal Processing Society Speech and Language Technical Committee.

Everyone is welcome to attend the talk!
SEEM-5201 Website:


Thursday, August 3, 2023 - 10:30