EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology
2nd INTERSPEECH Event

Aalborg, Denmark
September 3-7, 2001

                 

Triggering Individual Word Domains in N-Gram Language Models

E. I. Sicilia-Garcia, Ji Ming, F. J. Smith

Queen's University of Belfast, UK

We present a new method of introducing domain knowledge into an n-gram language model. It is based on a combination of language models for individual word domains. Each word model is built from an individual corpus which is formed by extracting those subsets of the entire training corpus which contain that significant word. When testing, significant words are extracted from a cache and their models are combined with a global language model. Different methods of combining the models are described; one simple method based on combining frequencies rather than probabilities gives promising results and provides a relatively simple method of introducing domain information into an n-gram language model. A 20% reduction in language model perplexity over the standard 3-gram approach is obtained which is similar to results obtained with other more complex domain models. The model also requires a small cache compared with other models requiring a cache.

Full Paper

Bibliographic reference.  Sicilia-Garcia, E. I. / Ming, Ji / Smith, F. J. (2001): "Triggering individual word domains in n-gram language models", In EUROSPEECH-2001, 701-704.