5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Statistical Modeling of Pronunciation and Production Variations for Speech Recognition

Filipp Korkmazskiy, Biing-Hwang Juang

Lucent Technologies, Bell Laboratories, USA

In this paper, we propose a procedure for training a pronunciation network with criteria consistent with the optimality objectives for speech recognition systems. In particular, we describe a framework for using maximum likelihood(ML) and minimum classification error(MCE) criteria for pronunciation network optimization. The ML criterion is used to obtain an optimal structure for the pronunciation network based on statistically-derived phonological rules. Discrimination among different pronunciation networks is achieved by weighting of the pronunciation networks, optimized by applying the MCE criterion. Experiment results demonstrate improvements in speech recognition accuracy after applying statistically derived phonological rules. It is shown that the impact of the pronunciation network weighting on the recognition performance is determined by the size of the recognition vocabulary.

Full Paper

Bibliographic reference.  Korkmazskiy, Filipp / Juang, Biing-Hwang (1998): "Statistical modeling of pronunciation and production variations for speech recognition", In ICSLP-1998, paper 0345.