INTERSPEECH 2013
14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Reverberant Speech Recognition Based on Denoising Autoencoder

Takaaki Ishii (1), Hiroki Komiyama (1), Takahiro Shinozaki (2), Yasuo Horiuchi (1), Shingo Kuroiwa (1)

(1) Chiba University, Japan
(2) Tokyo Institute of Technology, Japan

Denoising autoencoder is applied to reverberant speech recognition as a noise robust front-end to reconstruct clean speech spectrum from noisy input. In order to capture context effects of speech sounds, a window of multiple short-windowed spectral frames are concatenated to form a single input vector. Additionally, a combination of short and long-term spectra is investigated to properly handle long impulse response of reverberation while keeping necessary time resolution for speech recognition. Experiments are performed using the CENSREC-4 dataset that is designed as an evaluation framework for distant-talking speech recognition. Experimental results show that the proposed denoising autoencoder based front-end using the short-windowed spectra gives better results than conventional methods. By combining the long-term spectra, further improvement is obtained. The recognition accuracy by the proposed method using the short and long-term spectra is 97.0% for the open condition test set of the dataset, whereas it is 87.8% when a multi-condition training based baseline is used. As a supplemental experiment, large vocabulary speech recognition is also performed and the effectiveness of the proposed method has been confirmed.

Full Paper

Bibliographic reference.  Ishii, Takaaki / Komiyama, Hiroki / Shinozaki, Takahiro / Horiuchi, Yasuo / Kuroiwa, Shingo (2013): "Reverberant speech recognition based on denoising autoencoder", In INTERSPEECH-2013, 3512-3516.