5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Speech Recognition Using On-Line Estimation of Speaking Rate

Nelson Morgan, Eric Fosler, Nikki Mirghafori

International Computer Science Institute, Berkeley, CA, USA University of California at Berkeley, EECS Department, Berkeley, CA, USA

In this paper, we describe a rate of speech estimator that is derived directly from the acoustic signal. This measure has been developed as an alternative to lexical measures of speaking rate such as phones or syllables per second, which, in previous work, we estimated using a first recognition pass; the accuracy of our earlier lexical rate estimate depended on the quality of recognition. Here we show that our new measure is a good predictor of word error rate, and in addition, correlates moderately well with lexical speech rate. We also show that a simple modification of the model transition probabilities based on this measure can reduce the error rate almost as much as using lexical phones per second calculated from manually transcribed data. When we categorized test utterances based on speaking rate thresholds computed from the training set, we observed that a different transition probability value was required to minimize the error rate in each speaking rate bin. However, the reduction of error provided by this approach is still small in comparison with the increases in error observed for unusually fast or slow speech.

Full Paper

Bibliographic reference.  Morgan, Nelson / Fosler, Eric / Mirghafori, Nikki (1997): "Speech recognition using on-line estimation of speaking rate", In EUROSPEECH-1997, 2079-2082.