12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Reduction of Highly Nonstationary Ambient Noise by Integrating Spectral and Locational Characteristics of Speech and Noise for Robust ASR

Tomohiro Nakatani, Shoko Araki, Marc Delcroix, Takuya Yoshioka, Masakiyo Fujimoto

NTT Corporation, Japan

This paper proposes a new multi-channel noise reduction approach that can appropriately handle highly nonstationary noise based on the spectral and locational features of speech and noise. We focus on a distant talking scenario, where a 2-ch microphone array receives a target speaker's voice from the front while it receives highly nonstationary ambient noise from any direction. To cope well with this scenario, we introduce prior training not only for the spectral features of speech and noise but also for their locational features, and utilize them in a unified manner. The proposed method can distinguish rapid changes in speech and noise based mainly on their locational features, while it can reliably estimate the spectral shapes of the speech based largely on the spectral features. A filter-bank based implementation is also discussed to enable the proposed method to work in real time. Experiments using the PASCAL CHiME separation and recognition challenge task show the superiority of the proposed method as regards both speech quality and automatic speech recognition performance.

Full Paper

Bibliographic reference.  Nakatani, Tomohiro / Araki, Shoko / Delcroix, Marc / Yoshioka, Takuya / Fujimoto, Masakiyo (2011): "Reduction of highly nonstationary ambient noise by integrating spectral and locational characteristics of speech and noise for robust ASR", In INTERSPEECH-2011, 1785-1788.