12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

On the Estimation of Discount Parameters for Language Model Smoothing

Martin Sundermeyer, Ralf Schlüter, Hermann Ney

RWTH Aachen University, Germany

The goal of statistical language modeling is to find probability estimates for arbitrary word sequences. To obtain non-zero values, the probability distributions found in the training data need to be smoothed. In the widely-used Kneser-Ney family of smoothing algorithms, this is achieved by absolute discounting. The discount parameters can be computed directly using some approximation formulas minimizing the leaving-one-out log-likelihood of the training data.

In this work, we outline several shortcomings of the standard estimators for the discount parameters. We propose an efficient method for computing the discount values on held-out data and analyze the resulting parameter estimates. Experiments on large English and French corpora show consistent improvements in perplexity and word error rate over the baseline method. At the same time, this approach can be used for language model pruning, leading to slightly better results than standard pruning algorithms.

Full Paper

Bibliographic reference.  Sundermeyer, Martin / Schlüter, Ralf / Ney, Hermann (2011): "On the estimation of discount parameters for language model smoothing", In INTERSPEECH-2011, 1433-1436.