5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Comparison of Language Modelling Techniques for Russian and English

Edward W. D. Whittaker, Philip C. Woodland

Cambridge University, UK

In this paper the main differences between language modelling of Russian and English are examined. A Russian corpus and a comparable English corpus are described. The effects of high inflectionality in Russian and the relationship between the out-of-vocabulary rate and vocabulary size are investigated. Standard word and class N-gram language modelling techniques are applied to the two corpora and perplexity results are reported. A novel approach to the modelling of inflected languages is proposed and its efficacy compared with the other techniques.

Full Paper

Bibliographic reference.  Whittaker, Edward W. D. / Woodland, Philip C. (1998): "Comparison of language modelling techniques for Russian and English", In ICSLP-1998, paper 0967.