13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Using Sparse Classification Outputs as Feature Observations for Noise-robust ASR

Yang Sun (1,3), Bert Cranen (1), Jort F. Gemmeke (2), Lou Boves (1), Louis ten Bosch (1), Mathew M. Doss (3)

(1) Centre for Language and Speech Technology, Radboud University Nijmegen, the Netherlands
(2) Department ESAT, KU Leuven, Belgium
(3) Idiap Research Institute, Martigny, Switzerland

Sparse Classification (SC) is an exemplar-based approach to Automatic Speech Recognition. By representing noisy speech as a sparse linear combination of speech and noise exemplars, SC allows separating speech from noise. The approach has shown its robustness in noisy conditions, but at the cost of degradation in clean conditions. In this work, rather than using the state probability estimates obtained with SC directly in a Viterbi decoding, the probability distributions of SC are modeled by Gaussian Mixture Models (GMMs), for which purpose we introduce a novel whitening transformation. Results on the AURORA-2 task show that our proposed approach is especially effective in clean speech and in the matched noise conditions in test set A. Except in the -5 dB SNR condition we also find substantial improvements in the non-matched noise conditions in test set B.

Index Terms: template-based ASR, noise robustness, speech modeling

Full Paper

Bibliographic reference.  Sun, Yang / Cranen, Bert / Gemmeke, Jort F. / Boves, Lou / Bosch, Louis ten / Doss, Mathew M. (2012): "Using sparse classification outputs as feature observations for noise-robust ASR", In INTERSPEECH-2012, 2142-2145.