We propose an exact maximum likelihood (ML) approach for hidden Markov modeling of speech signals using models with mixtures of Gaussian autoregressive (AR) output probability distributions. This approach differs from the commonly used approach in two aspects. First, the parameters of the AR models are calculated using the exact, rather than the asymptotic, form of the likelihood function. Second, the gain of each AR model as well as its shape is estimated and used during the recognition phase. Since the asymptotic likelihood is appropriate only for sources which are stationary in some sense, the ML approach taken here can be considered as an approach for nonstationary modeling. The proposed approach was tested on the task of recognizing isolated versions of the English alphabet spoken by four different speakers by a system which was simultaneously trained for the four talkers (multi-speaker recognizer). This approach results in a recognition accuracy which is comparable to that obtained by the asymptotic ML approach.
Bibliographic reference. Serralheiro, A. J. / Ephraim, Y. / Rabiner, Lawrence R. (1989): "On nonstationary hidden Markov modeling of speech signals", In EUROSPEECH-1989, 1159-1162.