Efficient MDI Adaptation for n-Gram Language Models

Ruizhe Huang, Ke Li, Ashish Arora, Daniel Povey, Sanjeev Khudanpur

This paper presents an efficient algorithm for n-gram language model adaptation under the minimum discrimination information (MDI) principle, where an out-of-domain language model is adapted to satisfy the constraints of marginal probabilities of the in-domain data. The challenge for MDI language model adaptation is its computational complexity. By taking advantage of the backoff structure of n-gram model and the idea of hierarchical training method, originally proposed for maximum entropy (ME) language models [1], we show that MDI adaptation can be computed in linear-time complexity to the inputs in each iteration. The complexity remains the same as ME models, although MDI is more general than ME. This makes MDI adaptation practical for large corpus and vocabulary. Experimental results confirm the scalability of our algorithm on large datasets, while MDI adaptation gets slightly worse perplexity but better word error rates compared to simple linear interpolation.

 DOI: 10.21437/Interspeech.2020-2909

Cite as: Huang, R., Li, K., Arora, A., Povey, D., Khudanpur, S. (2020) Efficient MDI Adaptation for n-Gram Language Models. Proc. Interspeech 2020, 4916-4920, DOI: 10.21437/Interspeech.2020-2909.

  author={Ruizhe Huang and Ke Li and Ashish Arora and Daniel Povey and Sanjeev Khudanpur},
  title={{Efficient MDI Adaptation for n-Gram Language Models}},
  booktitle={Proc. Interspeech 2020},