4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Estimation of Statistical Phoneme Center Considering Phonemic Environments

Shigeki Okawa, Katsuhiko Shirai

Department of Information and Computer Science, Waseda University, Shinjuku, Tokyo, Japan

This paper presents a new scheme of acoustic modeling for speech recognition based on an idea of Statistical Phoneme Center. The Statistical Phoneme Center has several properties that are feasible to realize a higher-reliable phoneme extraction. First, we assume that there is a fictitious center point in every phoneme. The center is determined statistically by an iterative procedure to maximize the local likelihood using a large amount of speech data. Next, in order to evaluate the performance of phoneme extraction, phoneme recognition is realized by optimizing the likelihood based on Dynamic Time Warping technique. As the experimental result, 71.6% recognition accuracy is obtained for speaker independent phoneme recognition. This result demonstrate that the proposed SPC is a new effective concept to obtain more stabilized acoustic model for speaker independent speech recognition.

Full Paper

Bibliographic reference.  Okawa, Shigeki / Shirai, Katsuhiko (1996): "Estimation of statistical phoneme center considering phonemic environments", In ICSLP-1996, 1069-1072.