Second European Conference on Speech Communication and Technology

Genova, Italy
September 24-26, 1991


Acoustic Distribution Clustering in Phonetic Hidden Markov Models

M. Y. Hwang, X. D. Huang

School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA

Output distributions in hidden Markov models describe essential acoustic characteristics. Triphone generalization may force two models to be merged together when only parts of the model output distributions are similar, while the rest of the output distributions are different. This problem can be avoided if clustering is carried out at the distribution level. In this paper, a shared-distribution model is proposed to replace generalized triphone models for speaker-independent continuous speech recognition. Here, output distributions in the hidden Markov model are shared with each other if they exhibit acoustic similarity. In addition to detailed representation, it also gives us the freedom to use a large number of states for each phonetic model. Although an increase in the number of states will increase the total number of free parameters, with distribution sharing we can essentially eliminate those redundant states and have the luxury to maintain necessary ones. By using the shared-distribution model, the error rate on the DARPA Resource Management task has been reduced by 20% in comparison with the baseline SPHINX system.

Full Paper

Bibliographic reference.  Hwang, M. Y. / Huang, X. D. (1991): "Acoustic distribution clustering in phonetic hidden Markov models", In EUROSPEECH-1991, 785-788.