Second European Conference on Speech Communication and Technology

Genova, Italy
September 24-26, 1991


Lexical Tree Compression

Roxane Lacouture, Renato De Mori

McGill University, Montreal, Quebec, Canada

One of the problems of Automatic Speech Recognition (ASR) systems dealing with large lexica (more than 10,000 words) is the time needed to extract candidate words from a given input signal. Ideally, one should execute a Viterbi-like algorithm for each word model and pick the best word(s) or, in case of continuous speech, the best sequence(s) of words. However, the practical time complexity of such an operation is too high for large lexica. To overcome this problem, we propose to compress the lexicon into a graph by merging common suffixes and prefixes. For French and English lexica respectively, this yields a 90% and 70% reduction in memory space. A reduction in time complexity was also achieved which allowed us to execute in real time an exact A* search algorithm on the whole graph for isolated words as well as for whole sentences with no need for any fastmatch or other heuristics. Keywords: Large lexicon access, Graph, Lexicon Compression, Lexicon Representation, A* search

Full Paper

Bibliographic reference.  Lacouture, Roxane / Mori, Renato De (1991): "Lexical tree compression", In EUROSPEECH-1991, 581-584.