EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Efficient Decoding Strategy for Conversational Speech Recognition Using State-Space Models for Vocal-Tract-Resonance Dynamics

Jeff Z. Ma (1), Li Deng (2)

(1) BBN Technologies, USA; (2) Microsoft Company, USA

In this paper, we present an efficient strategy for likelihood computation and decoding in a continuous speech recognizer using underlying statespace dynamic models for the hidden speech dynamics. The state-space models have been constructed in a special way so as to be suitable for the conversational or casual style of speech where phonetic reduction abounds. The interacting multiple model (IMM) state estimation algorithm for switching state-space models is first introduced, which uses a merging strategy derived from Bayes's rule to meet the challenge of exponential growth in the switching combination. Then one specific dynamic-programming based decoding algorithm, incorporating the merging strategy, are derived. It successfully overcomes the exponential growth in the original search paths by using the path-merging strategy. Evaluation experiments on conversational speech using the Switchboard corpus demonstrate that the use of the new decoding strategy is capable of reducing the recognizer's word error rate compared with the baseline recognizers, including the HMM system and the state-space dynamic model using the HMM-produced phonetic boundaries.

Full Paper

Bibliographic reference.  Ma, Jeff Z. / Deng, Li (2001): "Efficient decoding strategy for conversational speech recognition using state-space models for vocal-tract-resonance dynamics", In EUROSPEECH-2001, 603-606.