13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Dereverberation based on Wavelet Packet Filtering for Robust Automatic Speech Recognition

Randy Gomez, Tatsuya Kawahara

Kyoto University, ACCMS, Sakyo-ku, Kyoto, Japan

This paper describes a multiple-resolution signal analysis to suppress late reflection of reverberation for robust automatic speech recognition (ASR). Wavelet packet tree (WPT) decomposition offers a finer resolution to discriminate the late reflection subspace from the speech subspace. By selecting appropriate wavelet basis in the WPT for speech and late reflection, we can effectively estimate the Wiener gain directly from the observed reverberant data. Moreover, the selection procedure is performed in accordance with the acoustic model likelihood used by the speech recognizer for improved ASR performance. Dereverberation is realized by filtering the wavelet packet coefficients with the Wiener gain to suppress the effects of the late reflection. Experimental evaluations with large vocabulary continuous speech recognition (LVCSR) in real reverberant conditions show that the proposed method outperforms conventional wavelet-based methods and other dereverberation techniques.

Index Terms: Speech recognition, Robustness, Dereverberation, Wavelet Packets

Full Paper

Bibliographic reference.  Gomez, Randy / Kawahara, Tatsuya (2012): "Dereverberation based on wavelet packet filtering for robust automatic speech recognition", In INTERSPEECH-2012, 1243-1246.