EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Discriminative Disfluency Modeling for Spontaneous Speech Recognition

Chung-Hsien Wu, Gwo-Lang Yan

National Cheng Kung University, Taiwan, ROC

Most automatic speech recognizers (ASRs) have concentrated on read speech, which is different from speech with the presence of disfluencies. These ASRs cannot handle the speech with a high rate of disfluencies such as filled pauses, repetition, repairs, false starts, and silence pauses in actual spontaneous speech or dialogues. In this paper, we focus on the modeling of the filled pauses "uh" and "um". The filled pauses contain the characteristics of nasal and lengthening, and the acoustic parameters for these characteristics are analyzed and adopted for disfluency modeling. A Gaussian mixture model (GMM), trained by a discriminative training algorithm that minimizes the recognition error, is proposed. A transition probability density function is defined from the GMM and used to weight the transition probability between the boundaries of fluency and disfluency models in the one-stage algorithm. Experimental result shows that the proposed method yields an improvement rate of 27.3% for disfluency compared to the baseline system.

Full Paper

Bibliographic reference.  Wu, Chung-Hsien / Yan, Gwo-Lang (2001): "Discriminative disfluency modeling for spontaneous speech recognition", In EUROSPEECH-2001, 1955-1958.