First International Conference on Spoken Language Processing (ICSLP 90)
This paper describes a speaker-independent, continuous speech recognition system that we designed and implemented, and reports some of the major features of this system with experimental results on a subset of the TIMIT database. The system is based on hidden Markov modeling of phoneme-sized acoustic units using continuous mixture Gaussian densities. The mixture densities are generated using an algorithm which minimizes the average trace of the mixture components and makes use of the segmental structure of the speech signals. Methods of preparing a dictionary for decoding and controlling the perplexity of grammars are also elaborated. On a subset of TIMIT data-base with 443 words and a grammar perplexity of 49, the system achieved decoding rates of 87.4% sentence correct, 97.9% word correct and 97.5% word accuracy on the training set; on the test set, the rates are 72.4% sentence correct, 93.9% word correct and 92.4% word accuracy.
Bibliographic reference. Zhao, Yunxin / Wakita, Hisashi (1990): "Experiments with a speaker-independent continuous speech recognition system on the timit database", In ICSLP-1990, 697-700.