EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Statistical Sound Source Identification in a Real Acoustic Environment for Robust Speech Recognition Using a Microphone Array

Takanobu Nishiura (1), Satoshi Nakamura (1), Kiyohiro Shikano (2)

(1) ATR Spoken Language Translation Research Laboratories, Japan
(2) Nara Institute of Science and Technology, Japan

It is very important for a hands-free speech interface to capture distant talking speech with high quality. A microphone array is an ideal candidate for this purpose. However, this approach requires localizing the target talker. To cope with this problem, we propose a new talker localization method consisting of two algorithms. One algorithm is for multiple sound source localization based on CSP (Cross-power Spectrum Phase) analysis. The other algorithm is for sound source identification among localized multiple sound sources towards talker localization. In this paper, we particularly focus on the latter statistical sound source identification among localized multiple sound sources with statistical speech and environmental sound models based on GMMs (Gaussian Mixture Models) and a microphone array towards talker localization.

Full Paper

Bibliographic reference.  Nishiura, Takanobu / Nakamura, Satoshi / Shikano, Kiyohiro (2001): "Statistical sound source identification in a real acoustic environment for robust speech recognition using a microphone array", In EUROSPEECH-2001, 2611-2614.