5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

MDI Adaptation of Language Models Across Corpora

P. Srinivasa Rao, Satya Dharanipragada, Salim Roukos

IBM, Thomas J. Watson Research Center, Yorktown Heights, NY, USA

The amount of text data available from a corpus for training language models is usually limited. Data from larger general or related corpora can be utilized to improve the performance of the language model on the corpus of interest. We explore one method of adapting a prior model from a large corpus to a smaller one of interest. Perplexity results of adapting a prior model constructed using the NAB corpus to the Switchboard and ATIS corpora are presented and compared with those of interpolated models.

Full Paper

Bibliographic reference.  Rao, P. Srinivasa / Dharanipragada, Satya / Roukos, Salim (1997): "MDI adaptation of language models across corpora", In EUROSPEECH-1997, 1979-1982.