Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Spoken Word Recognition Using Phoneme Duration Information Estimated from Speaking Rate of Input Speech

Yukihiro Osaka, Shozo Makino, Toshio Sone

Graduate School of Information Sciences, Tohoku University Sendai, Japan

This paper describes a spoken word recognition system is based on phoneme duration estimated from the speaking rate of an input speech. We found that the normalization of phoneme duration with the average vowel duration of input speech and with the average duration of each phoneme class was very effective to reduce the variation of phoneme duration. For the normalization, we propose the first-order linear regressive equation as a function of the average vowel duration for estimating the duration of each phoneme in input speech. We applied this method to isolated spoken word recognition. We prepared several kinds of equations by taking into account various phoneme contexts and then examined them by word recognition scores. The word recognition score was 97.3% for the 212 word vocabulary, using the equation based on the weighted sum of two estimates from the preceding and the following phoneme dependent estimation. The score increased by 1.6% comparing to that without the information of speaking rate.

Full Paper

Bibliographic reference.  Osaka, Yukihiro / Makino, Shozo / Sone, Toshio (1994): "Spoken word recognition using phoneme duration information estimated from speaking rate of input speech", In ICSLP-1994, 191-194.