ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Improving unsupervised language model adaptation with discriminative data filtering

Shuangyu Chang, Michael Levit, Partha Parthasarathy, Benoit Dumoulin

In this paper we propose a method for improving unsupervised language model (LM) adaptation by discriminatively filtering the adaptation training material. Two main issues are addressed in this solution: first, how to automatically identify recognition errors and more correct alternatives without manual transcription; second, how to update the model parameters based on the recognition error cues. Within the adaptation framework, we address the first issue by predicting regression pairs between recognition results from the baseline LM and an initial adapted LM, using features such as language model score difference. For the second issue, we adopted a data filtering approach to penalize potent error attractors introduced by the unsupervised adaptation data, using Ngram set difference statistics computed on the predicted regression pairs. Experimental results on a large real-world application of voice catalog search demonstrated that the proposed solution provides significant recognition error reduction over an initial adapted LM.

doi: 10.21437/Interspeech.2013-328

Cite as: Chang, S., Levit, M., Parthasarathy, P., Dumoulin, B. (2013) Improving unsupervised language model adaptation with discriminative data filtering. Proc. Interspeech 2013, 1208-1212, doi: 10.21437/Interspeech.2013-328

  author={Shuangyu Chang and Michael Levit and Partha Parthasarathy and Benoit Dumoulin},
  title={{Improving unsupervised language model adaptation with discriminative data filtering}},
  booktitle={Proc. Interspeech 2013},