4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Bayesian Estimation Methods for N-Gram Language Model Adaptation

Marcello Federico

IRST - Istituto per la Ricerca Scientica e Tecnologica, Povo, Trento, Italy

Stochastic n-gram language models have been successfully applied in continuous speech recognition for several years. Such language models provide many computational advantages but also require huge text corpora for parameter estimation. Moreover, the texts must exactly reflect, in a statistical sense, the user's language. Estimating a language model on a sample that is not representative severely affects speech recognition performance. A solution to this problem is provided by the Bayesian learning framework. Beyond the classical estimates, a Bayes derived interpolation model is proposed. Empirical comparisons have been carried out on a 10,000-word radiological reporting domain. Results are provided in terms of perplexity and recognition accuracy.

Full Paper

Bibliographic reference.  Federico, Marcello (1996): "Bayesian estimation methods for n-gram language model adaptation", In ICSLP-1996, 240-243.