Second European Conference on Speech Communication and Technology

Genova, Italy
September 24-26, 1991


Phonetic Context in Hybrid HMM/MLP Continuous Speech Recognition

Nelson Morgan (1), Hervé Bourlard (1,2), C. Wooters (1), Phil Kohn (1), M. Cohen (3)

(1) International Computer Science Institute, Berkeley, CA, USA
(2) Lernout & Hauspie Speechproducts, Wemmel, Belgium
(3) SRI International, Menlo Park, CA, USA

Earlier work has shown the ability of Multilayer Perceptrons (MLPs) to estimate emission probabilities for a Hidden Markov Model (HMM) [1][2][3]. In these reports, we have shown that these estimates have led to improved performance over counting estimation techniques in the case where a fairly simple HMM was used. However, current state-of-the-art continuous speech recognizers require HMMs with greater complexity, e. g. multiple densities per phone and/or context-dependent phone models. Brute-force application of our earlier techniques to triphones (the standard approach to context-dependent HMMs) would result in an output layer with many thousands of units, and many millions of connections to train. In this report we describe another approach to the application of MLPs to context-dependent probability density estimation, as well as some practical aspects of efficient implementation of the method.

Full Paper

Bibliographic reference.  Morgan, Nelson / Bourlard, Hervé / Wooters, C. / Kohn, Phil / Cohen, M. (1991): "Phonetic context in hybrid HMM/MLP continuous speech recognition", In EUROSPEECH-1991, 109-112.