13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Example-based Speech Enhancement With Joint of Spatial, Spectral & Temporal Cues of Speech and Noise

Keisuke Kinoshita, Marc Delcroix, Mehrez Souden, Tomohiro Nakatani

NTT Communication Science Laboratories, NTT Corporation, Japan

This paper proposes a multichannel speech enhancement technique that leverages three essential cues embedded in the observed signal, i.e., spatial, spectral and temporal cues, for differentiating underlying clean speech component from noise. The proposed method estimates clean speech and noise features in a single optimization criterion by integrating two approaches, namely, example- and model-based multichannel speech enhancement approaches: The former utilizes spectral and temporal cues, while the latter spatial and spectral cues. In the experiment, we show the superiority of the proposed method over the conventional methods in terms of the automatic keyword recognition performance in adverse and highly non-stationary noisy environment.

Index Terms: example-based speech enhancement, model-based approach, speech recognition, blind source separation

Full Paper

Bibliographic reference.  Kinoshita, Keisuke / Delcroix, Marc / Souden, Mehrez / Nakatani, Tomohiro (2012): "Example-based speech enhancement with joint of spatial, spectral & temporal cues of speech and noise", In INTERSPEECH-2012, 1926-1929.