EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Modeling Pronunciation Variation Using Context-Dependent Weighting and B/S Refined Acoustic Modeling

Fang Zheng (1), Zhanjiang Song (1), Pascale Fung (2), William Byrne (3)

(1) Tsinghua Universiy, China
(2) University of Science and Technology, Hong Kong
(3) The Johns Hopkins University, USA

The pronunciation variability is an important issue that must be faced with when developing practical automatic spontaneous speech recognition systems. By studying the initial/final (IF) characteristics of Chinese language and developing the Bayesian equation, we propose the concepts of generalized initial/final (GIF) and generalized syllable (GS), the GIF modeling method and the IF-GIF modeling method, as well as the context-dependent pronunciation weighting method. By using these approaches, the IF-GIF modeling reduces the Chinese syllable error rate (SER) by 6.3% and 4.2% compared with the GIF modeling and IF modeling respectively when the language modeling, such as syllable or word N-gram, is not used.

Full Paper

Bibliographic reference.  Zheng, Fang / Song, Zhanjiang / Fung, Pascale / Byrne, William (2001): "Modeling pronunciation variation using context-dependent weighting and b/s refined acoustic modeling", In EUROSPEECH-2001, 57-60.