5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

N-Gram Language Model Adaptation Using Small Corpus for Spoken Dialog Recognition

Akinori Ito, Hideyuki Saitoh, Masaharu Katoh, Masaki Kohda

Faculty of Engineering, Yamagata University, Yonezawa, Yamagata, Japan

This paper describes an N-gram language model adaptation technique. As an N-gram model requires a large size sample corpus for probability estimation, it is difficult to utilize N-gram model for a specific small task. In this paper, N-gram task adaptation is proposed using large corpus of the general task (TI text) and small corpus of the specific task (AD text). A simple weighting is employed to mix TI and AD text. In addition to mix two texts, the effect of vocabulary is also investigated. The experimental results show that adapted N-gram model with proper vocabulary size has significantly lower perplexity than the task independent models.

Full Paper

Bibliographic reference.  Ito, Akinori / Saitoh, Hideyuki / Katoh, Masaharu / Kohda, Masaki (1997): "N-gram language model adaptation using small corpus for spoken dialog recognition", In EUROSPEECH-1997, 2735-2738.