13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Bayesian Feature Enhancement for ASR of Noisy Reverberant Real-World Data

Alexander Krueger (1), Oliver Walter (2), Volker Leutnant (2), Reinhold Haeb-Umbach (2)

(1) Research & Innovation, Technicolor, Hannover, Germany
(2) Department of Communications Engineering, University of Paderborn, Germany

In this contribution we investigate the effectiveness of Bayesian feature enhancement (BFE) on a medium-sized recognition task containing real-world recordings of noisy reverberant speech. BFE employs a very coarse model of the acoustic impulse response (AIR) from the source to the microphone, which has been shown to be effective if the speech to be recognized has been generated by artificially convolving nonreverberant speech with a constant AIR. Here we demonstrate that the model is also appropriate to be used in feature enhancement of true recordings of noisy reverberant speech. On the Multi-Channel Wall Street Journal Audio Visual corpus (MC-WSJ-AV) the word error rate is cut in half to 41.9% compared to the ETSI Standard Front-End using as input the signal of a single distant microphone with a single recognition pass.

Index Terms: bayesian feature enhancement, dereverberation, denoising

Full Paper

Bibliographic reference.  Krueger, Alexander / Walter, Oliver / Leutnant, Volker / Haeb-Umbach, Reinhold (2012): "Bayesian feature enhancement for ASR of noisy reverberant real-world data", In INTERSPEECH-2012, 807-810.