4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Introducing Linguistic Constraints into Statistical Language Modeling

Petra Geutner

Interactive Systems Laboratories, University of Karlsruhe (Germany) / Carnegie Mellon University (USA)

Building robust stochastic language models is a major issue in speech recognition systems. Conventional word-based n-gram models do not capture any linguistic constraints inherent in speech. In this paper the notion of function and content words (open/closed word classes) is used to provide linguistic knowledge that can be incorporated into language models. Function words are articles, prepositions, personal pronouns - content words are nouns, verbs, adjectives and adverbs. Based on this class definition resulting in function and content word markers, a new language model is defined. A combination of the word-based model with this new model will be introduced. The combined model shows modest improvements both in perplexity results and recognition performance.

Full Paper

Bibliographic reference.  Geutner, Petra (1996): "Introducing linguistic constraints into statistical language modeling", In ICSLP-1996, 402-405.