Third International Conference on Spoken Language Processing (ICSLP 94)
We analyze speech rate through an envelope extraction process. The process is low-pass filtering of rectified speech wave to eliminate ripples caused from pitch and vocal resonances. Speech wave is amplitude modulated about 8 mora/sec. Dips of the envelope correspond to consonants or phonemic boundaries, therefore dips within a unit time is correlated with the rate of speech. We measured the rate of speech from an interviewing between a female interviewer and a male interviewee. Speech data analysed consists of 7 utterances of the man and 6 utterances of the lady with durations of 2 to 7 seconds. Same utterances were labeled manually for locations of individual phonemes. Manually computed rate excluding pauses is faster than averaged one. By DFT of the envelope, a frequency component of the rate of speech is avilable and have shown to be correlated with the manual rate at the coefficient of 0.57.
Bibliographic reference. Kitazawa, Shigeyoshi / Kobayashi, Satoshi / Matsunaga, Takao / Ichikawa, Hideya (1994): "Tempo estimation by wave envelope for recognition of paralinguistic features in spontaneous speech", In ICSLP-1994, 1691-1694.