ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Multi-stream recognition of noisy speech with performance monitoring

Ehsan Variani, Feipeng Li, Hynek Hermansky

A prototype multi-stream system with a performance monitor for stream selection is proposed to recognize speech in unknown noise. The speech signal is decomposed into seven band-limited streams. Posterior probabilities of phonemes are estimated by a multi-layer perceptron (MLP) in each of these band-limited streams. Estimated posterior vectors of all 127 combinations (processing streams) of the seven band-limited streams form inputs to a second-stage MLP that estimates posterior probabilities of phonemes in each processing stream. A performance monitor is designed to predict the reliability of individual processing streams based on the outputs from these streams. The top N streams that are least affected by noise are selected and their outputs are averaged to yield the final posterior probability vector used in Viterbi search for the best phoneme sequence. Experimental results show that the proposed technique is effective in dealing with noise.

doi: 10.21437/Interspeech.2013-273

Cite as: Variani, E., Li, F., Hermansky, H. (2013) Multi-stream recognition of noisy speech with performance monitoring. Proc. Interspeech 2013, 2978-2981, doi: 10.21437/Interspeech.2013-273

  author={Ehsan Variani and Feipeng Li and Hynek Hermansky},
  title={{Multi-stream recognition of noisy speech with performance monitoring}},
  booktitle={Proc. Interspeech 2013},