Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Modeling Dynamics in Connectionist Speech Recognition - The Time Index Model

Yochai Konig, Nelson Morgan

International Computer Science Institute, Berkeley, CA, USA

We are experimenting with an approach to connectionist speech recognition that models the dynamics within a speech segment using temporal position as an explicit variable. Currently, the most common model for human speech production that is used in speech recognition is the Hidden Markov Model (HMM). However, HMMs suffer from well known limitations; most notably, the assumption that the observations generated in a given state are independent and identically distributed (i.i.d.). As an alternative, we are developing a time index model that explicitly conditions the emission probability of a state on the time index, where time index is defined as the number of frames since entering a state till the current frame. Thus, the proposed model does not require the i.i.d. assumption. Our pilot results suggest that the time-index approach can greatly reduce error if we have good information about the phoneme boundary location.

Full Paper

Bibliographic reference.  Konig, Yochai / Morgan, Nelson (1994): "Modeling dynamics in connectionist speech recognition - the time index model", In ICSLP-1994, 1523-1526.