INTERSPEECH 2013
14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Improving Unsupervised Language Model Adaptation with Discriminative Data Filtering

Shuangyu Chang, Michael Levit, Partha Parthasarathy, Benoit Dumoulin

Microsoft Corporation, USA

In this paper we propose a method for improving unsupervised language model (LM) adaptation by discriminatively filtering the adaptation training material. Two main issues are addressed in this solution: first, how to automatically identify recognition errors and more correct alternatives without manual transcription; second, how to update the model parameters based on the recognition error cues. Within the adaptation framework, we address the first issue by predicting regression pairs between recognition results from the baseline LM and an initial adapted LM, using features such as language model score difference. For the second issue, we adopted a data filtering approach to penalize potent error attractors introduced by the unsupervised adaptation data, using Ngram set difference statistics computed on the predicted regression pairs. Experimental results on a large real-world application of voice catalog search demonstrated that the proposed solution provides significant recognition error reduction over an initial adapted LM.

Full Paper

Bibliographic reference.  Chang, Shuangyu / Levit, Michael / Parthasarathy, Partha / Dumoulin, Benoit (2013): "Improving unsupervised language model adaptation with discriminative data filtering", In INTERSPEECH-2013, 1208-1212.