Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Tempo Estimation by Wave Envelope for Recognition of Paralinguistic Features in Spontaneous Speech

Shigeyoshi Kitazawa, Satoshi Kobayashi, Takao Matsunaga, Hideya Ichikawa

Shizuoka University, Shizuoka, Japan

We analyze speech rate through an envelope extraction process. The process is low-pass filtering of rectified speech wave to eliminate ripples caused from pitch and vocal resonances. Speech wave is amplitude modulated about 8 mora/sec. Dips of the envelope correspond to consonants or phonemic boundaries, therefore dips within a unit time is correlated with the rate of speech. We measured the rate of speech from an interviewing between a female interviewer and a male interviewee. Speech data analysed consists of 7 utterances of the man and 6 utterances of the lady with durations of 2 to 7 seconds. Same utterances were labeled manually for locations of individual phonemes. Manually computed rate excluding pauses is faster than averaged one. By DFT of the envelope, a frequency component of the rate of speech is avilable and have shown to be correlated with the manual rate at the coefficient of 0.57.

Full Paper

Bibliographic reference.  Kitazawa, Shigeyoshi / Kobayashi, Satoshi / Matsunaga, Takao / Ichikawa, Hideya (1994): "Tempo estimation by wave envelope for recognition of paralinguistic features in spontaneous speech", In ICSLP-1994, 1691-1694.