4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

HMMs and OWE Neural Network for Continuous Speech Recognition

Nicolas Pican, Dominique Fohr, Jean-François Mari

CRIN-CNRS & INRIA Lorraine, Vandoeuvre-les-Nancy, France

The phonetic context has a large effect on stop consonants in a continuous speech signal [1]. Therefore recognition systems that model allophones using context-dependent Hidden Markov Models have been implemented [3]. HMMs have a great ability for the segmentation in the temporal domain [4][6] but have some difficulties in the recognition because the MLE training (Maximum Likelihood Estimation) is not discriminant, whereas the discrimination is one of the abilities of the Artificial Neural Networks models. In the last three years we have developed a new ANN model named OWE (Orthogonal Weight Estimator)[9][10]. The principle of the OWE is a ANN that classifies an input pattern according to contextual environment. This new ANN architecture tackles the problem of context dependent behaviour training. Roughly, the principle is based on main MLP (Multilayered Perceptron) in which each synaptic weight connection value is estimated by another MLP (an OWE) with respect to context representation. In this paper, we present a hierarchical system for phoneme recognition: first the system segments the input signal using 48 context independent HMMs. Then the stop consonant are reordered by a OWE ANN. Experiments on TIMIT show 78 % of correct recognition rate on the 6 stop consonants (/p, t, k, b, d, g).

Full Paper

Bibliographic reference.  Pican, Nicolas / Fohr, Dominique / Mari, Jean-François (1996): "HMMs and OWE neural network for continuous speech recognition", In ICSLP-1996, 1309-1312.