EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Smoothing Issues in the Structured Language Model

Woosung Kim, Sanjeev Khudanpur, Jun Wu

The Johns Hopkins University, USA

The Structured Language Model (SLM) recently introduced by Chelba and Jelinek is a powerful general formalism for exploiting syntactic dependencies in a left-to-right language model for applications such as speech and handwriting recognition, spelling correction, machine translation, etc. Unlike traditional N-gram models, optimal smoothing techniques -- discounting methods and hierarchical structures for back-off -- are still being developed for the SLM. In the SLM, the statistical dependencies of a word on immediately preceding words, preceding syntactic heads, non-terminal labels, etc., are parameterized as overlapping N-gram dependencies. Statistical dependencies in the parser and tagger used by the SLM also have N-gram like structure. Deleted interpolation has been used to combine these N-gram like models. We demonstrate on two different corpora -- WSJ and Switchboard -- that more recent modified back-off strategies and nonlinear interpolation methods considerably lower the perplexity of the SLM. Improvement in word error rate is also demonstrated on the Switchboard corpus.

Full Paper

Bibliographic reference.  Kim, Woosung / Khudanpur, Sanjeev / Wu, Jun (2001): "Smoothing issues in the structured language model", In EUROSPEECH-2001, 717-720.