5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

A Latent Semantic Analysis Framework for Large-Span Language Modeling

Jerome R. Bellegarda

Advanced Technology Group, Apple Computer, Cupertino, California, USA

A new framework is proposed to construct large-span, semantically-derived language models for large vocabulary speech recognition. It is based on the latent semantic analysis paradigm, which seeks to automatically uncover the salient semantic relationships between words and documents in a given corpus. Because of its semantic nature. a latent semantic language model is well suited to complement a conventional. more syntactically-oriented n-gram. An integrative formulation is proposed for the combination of the two paradigms. The performance of the resulting integrated language model. as measured by perplexity, compares favorably with the corresponding n-gram performance.

Bibliographic reference.  Bellegarda, Jerome R. (1997): "A latent semantic analysis framework for large-Span language modeling", In EUROSPEECH-1997, 1451-1454.