Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

Detection Of Unknown Words Using Garbage Cluster Models For Continuous Speech Recognition

Hiroyuki Sakamoto, Shoichi Matsunaga

ATR Interpreting Telecommunications Research Labs., Kyoto, Japan

This paper proposes a speech recognition strategy to deal with utterances having unknown words. The strategy integrates unknown-word detection using garbage (phoneme) cluster Hidden Markov Models (HMMs) and registered-word recognition using phoneme HMMs. These garbage cluster models are designed to minimize both the increase in processing for unknown words and the decrease in the recognition performance for registered words. A further goal is to maximize the detection performance of unknown words. Two important issues are studied: 1) the balance between scores of registered words and those of unknown words, and 2) use of the unknown word penalty derived from the stochastic language model. In sentence recognition experiments using this unknown-word processing, the proposed cluster models that take into account the Japanese syllabic construction achieved the best word accuracy of 71.8%, compared with 56.6% for sentence recognition without this processing. We confirmed the effectiveness of this strategy.

Full Paper

Bibliographic reference.  Sakamoto, Hiroyuki / Matsunaga, Shoichi (1995): "Detection of unknown words using garbage cluster models for continuous speech recognition", In EUROSPEECH-1995, 2103-2106.