5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Creating Hidden Markov Models for Fast Speech

Thilo Pfau, Guenther Ruske

Institute for Human-Machine-Communication, Technical University of Munich, Germany

This paper deals with the problem of building hidden Markov models (HMMs) suitable for fast speech. First an automatic procedure is presented to split speech material into different categories according to the speaking rate. Then the problem of sparse data available for the estimation of HMMs for fast speech is discussed. A comparison of different methods to overcome this problem follows. The main emphasis here is set on robust reestimation techniques like maximum aposteriori estimation (MAP) as well as on methods to reduce the variability of the speech signal and therefore to be able to reduce the number of HMM parameters. Vocaltract length normalization (VTLN) is chosen for that purpose. Finally a comparison of various combinations of the methods discussed is presented basing on word error rates for fast speech. The best method (MAPVTLN) results in a decrease of the error rate of 10% relative to the baseline system.

Full Paper

Bibliographic reference.  Pfau, Thilo / Ruske, Guenther (1998): "Creating hidden Markov models for fast speech", In ICSLP-1998, paper 0255.