13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Tied-State Mixture Language Model for WFST-based Speech Recognition

Hitoshi Yamamoto, Paul R. Dixon, Shigeki Matsuda, Chiori Hori, Hideki Kashioka

Spoken Language Communication Laboratory, National Institute of Information and Communication Technology (NICT), Kyoto, Japan

This paper describes a language model combination method for automatic speech recognition (ASR) systems based on Weighted Finite-State Transducers (WFSTs). The performance of ASR in real applications often degrades when an input utterance is out of the domain of the prepared language models. To cover a wide range of domains, it is possible to utilize a combination of multiple language models. To do this, we propose a language model combination method with a two-step approach; it first uses a union operation to incorporate all components into a single transducer and then merges states of the transducer to mix n-grams included in multiple models and to retain unique n-grams in each model simultaneously. The method has been evaluated in speech recognition experiments on travel conversation tasks and has demonstrated improvements in recognition performance.

Index Terms: Language model combination, WFST

Full Paper

Bibliographic reference.  Yamamoto, Hitoshi / Dixon, Paul R. / Matsuda, Shigeki / Hori, Chiori / Kashioka, Hideki (2012): "Tied-state mixture language model for WFST-based speech recognition", In INTERSPEECH-2012, 174-177.