EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


MAP Combination of Multi-Stream HMM or HMM/ANN Experts

Andrew Morris, Astrid Hagen, Hervé Bourlard

IDIAP, Switzerland

Automatic speech recognition (ASR) performance falls dramatically with the level of mismatch between training and test data. The human ability to recognise speech when a large proportion of frequencies are dominated by noise has inspired the "missing data" and "multi-band" approaches to noise robust ASR. "Missing data" ASR identifies low SNR spectral data in each data frame and then ignores it. Multi-band ASR trains a separate model for each position of missing data, estimates a reliability weight for each model, then combines model outputs in a weighted sum. A problem with both approaches is that local data reliability estimation is inherently inaccurate and also assumes that all of the training data was clean. In this article we present a model in which adaptive multi-band expert weighting is incorporated naturally into the maximum a posteriori (MAP) decoding process.

Full Paper

Bibliographic reference.  Morris, Andrew / Hagen, Astrid / Bourlard, Hervé (2001): "MAP combination of multi-stream HMM or HMM/ANN experts", In EUROSPEECH-2001, 225-228.