Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
We present experiments in using neural network based methods to initialize continuous observation density hidden Markov models (CDHMMs). Proper initialization provides an easy way to avoid excessive amount of iterations, when maximum likelihood algorithms are used to estimate the parameters of CDHMMs. This is important in, for example, phoneme based automatic speech recognition, where the output density functions of the states of HMMs are complex and a lot of training data must be used. In our work CDHMMs are used as phoneme models in the task of transcribing speech into phoneme sequences. The probability density function of the output distribution for a state is approximated by mixture of a large number of multivariate Gaussian density functions (typically 25). We present experiments of initializing the means of mixture Gaussians by Sell-Organizing Maps (SOMs) and Learning Vector Quantization (LVQ). The results of the experiments indicate that initialization by SOMs speeds up the convergence in ML-parameter estimation, when error rate is used as a measure. The same applies to LVQ especially combined with segmental K-means algorithm.
Bibliographic reference. Kurimo, Mikko / Torkkola, Kari (1992): "Application of self-organizing maps and LVQ in training continuous density hidden Markov models for phonemes", In ICSLP-1992, 543-546.