4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
Stochastic n-gram language models have been successfully applied in continuous speech recognition for several years. Such language models provide many computational advantages but also require huge text corpora for parameter estimation. Moreover, the texts must exactly reflect, in a statistical sense, the user's language. Estimating a language model on a sample that is not representative severely affects speech recognition performance. A solution to this problem is provided by the Bayesian learning framework. Beyond the classical estimates, a Bayes derived interpolation model is proposed. Empirical comparisons have been carried out on a 10,000-word radiological reporting domain. Results are provided in terms of perplexity and recognition accuracy.
Bibliographic reference. Federico, Marcello (1996): "Bayesian estimation methods for n-gram language model adaptation", In ICSLP-1996, 240-243.