EUROSPEECH 2001 Scandinavia
This paper shows that a domain-dependent language model and states-kipped HMMs can achieve improvements in word recognition accuracy on a broadcast sports news transcription task. Although a domain-dependent language model is much better than a general model in terms of word error rate, the smaller training corpus for a special topic relative to the general news corpus leads to problems especially in higher-order n-gram probability estimation. In this paper, we tried a linear interpolation technique to smooth out unreliable higher-order n-gram probabilities using more reliable lower-order n-gram probabilities. We also applied a language model adaptation technique by using news manuscripts on sports topics. For acoustic modeling, we added two state-skipping paths to three-state HMMs to deal with phonemes of duration less than three frames. Overall, we reduced the word error rate from 15.1% to 5.8%, and achieved sufficient performance to realize real-time subtitling services.
Bibliographic reference. Matsui, Atsushi / Segi, Hiroyuki / Kobayashi, Akio / Imai, Toru / Ando, Akio (2001): "Speech recognition of broadcast sports news", In EUROSPEECH-2001, 709-712.