Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

A Statistical Approach to Language Modelling for the ATIS Task

Joshua Koppelman, Stephen Delia Pietra, Mark Epstein, Salim Roukos, Todd Ward

Human Language Technologies Group, IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

The goal of this research is to develop an effective natural language component for IBM's spoken language understanding system for the ATIS domain. We use training data to assign a probability distribution to the reference interpretation, the NLParse, which minimizes the observed perplexity of the test data. We limit our scope to deal only with those ATIS2 sentences which can be understood unambiguously out of context (the so-called "Class A" queries). The decoder component of the finished system will use the natural language probabilities to select the most probable NLParse translations for a given English input. The NLParse translation can then be deterministically converted to SQL to query the ATIS database for the correct answer. We use a number of different deleted interpolation and maximum entropy techniques to improve on the standard trigram model, and we achieve a reduction in test perplexity from 15.9 to 14.1 bits per item.

Full Paper

Bibliographic reference.  Koppelman, Joshua / Pietra, Stephen Delia / Epstein, Mark / Roukos, Salim / Ward, Todd (1995): "A statistical approach to language modelling for the ATIS task", In EUROSPEECH-1995, 1785-1788.