Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Unsupervised Language Model Adaptation Using Latent Semantic Marginals

Yik-Cheung Tam, Tanja Schultz

Carnegie Mellon University, USA

We integrated the Latent Dirichlet Allocation (LDA) approach, a latent semantic analysis model, into unsupervised language model adaptation framework. We adapted a background language model by minimizing the Kullback-Leibler divergence between the adapted model and the background model subject to a constraint that the marginalized unigram probability distribution of the adapted model is equal to the corresponding distribution estimated by the LDA model - the latent semantic marginals. We evaluated our approach on the RT04 Mandarin Broadcast News test set and experimented with different LM training settings. Results showed that our approach reduces the perplexity and the character error rates using supervised and unsupervised adaptation.

Full Paper

Bibliographic reference.  Tam, Yik-Cheung / Schultz, Tanja (2006): "Unsupervised language model adaptation using latent semantic marginals", In INTERSPEECH-2006, paper 1705-Thu1A2O.2.