4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
In this paper we investigate statistical language models with a variable context length. For such models the number of relevant words in a context is not fixed as in conventional M-gram models but depends on the context itself. We develop a measure for the quality of variable-length models and present a pruning algorithm for the creation of such models, based on this measure. Further we address the question how the use of a special backing-off distribution can improve the language models. Experiments were performed on two data bases, the ARPA-NAB corpus and the German Verbmobil corpus, respectively. The results show that variable-length models outperform conventional models of the same size. Furthermore it can be seen that if a moderate loss in performance is acceptable, the size of a language model can be reduced drastically by using the presented pruning algorithm.
Bibliographic reference. Kneser, Reinhard (1996): "Statistical language modeling using a variable context length", In ICSLP-1996, 494-497.