13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Log-spectral Feature Reconstruction Based on an Occlusion Model for Noise Robust Speech Recognition

Jose A. González (1), Antonio M. Peinado (1), Angel M. Gómez (1), Ning Ma (2)

(1) Dpt. Signal Theory, Telematics, and Communications, University of Granada, Spain
(2) Dpt. Computer Science, University of Sheffield, UK

This paper addresses the problem of feature compensation in the log-spectral domain for speech recognition in noise by means of minimum mean square error (MMSE) estimation assuming an occlusion speech/noise model. Under this model, the usual non-linear mismatch function that represents the speech distortion due to additive noise can be reasonably well approximated by the maximum of the two mixing sources (speech and noise). Using this approximation, we propose to enhance the degraded speech features by means of a novel MMSE estimator. The resulting technique shows clear similarities with soft-mask missing-data (MD) reconstruction, although the experimental results on both Aurora-2 and Aurora-4 databases show the effectiveness of the proposed technique in comparison with MD.

Index Terms: Feature compensation, MMSE estimation, Missing data imputation, speech recognition

Full Paper

Bibliographic reference.  González, Jose A. / Peinado, Antonio M. / Gómez, Angel M. / Ma, Ning (2012): "Log-spectral feature reconstruction based on an occlusion model for noise robust speech recognition", In INTERSPEECH-2012, 2630-2633.