12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

A Multichannel Feature-Based Processing for Robust Speech Recognition

Mehrez Souden, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani

NTT Corporation, Japan

We propose a new approach for multichannel robust speech recognition. This approach extends the vector Taylor series (VTS)-based feature compensation from the single channel to the multichannel case. Precisely, we use the first order VTS to approximate each of the microphone feature vectors. Afterwards, these features are jointly processed to estimate the acoustic channel and noise statistics via expectation maximization (EM). Experimental results with TI-Digits and measured impulse responses show that the proposed method can achieve significant gains in terms of word recognition accuracy in different noise conditions.

Full Paper

Bibliographic reference.  Souden, Mehrez / Kinoshita, Keisuke / Delcroix, Marc / Nakatani, Tomohiro (2011): "A multichannel feature-based processing for robust speech recognition", In INTERSPEECH-2011, 689-692.