Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

A Language Model for Compound Words in Speech Recognition

Marcus Spies

Institute for Logic and Linguistics, European Language Business Unit, Scientific Center Heidelberg IBM Germany Information Systems, Heidelberg, Germany

In several languages, words can be aggregated into compound words. In present speech recognition systems, compound words are treated as as additional single words. This creates redundancies in the phonetic word models that have to be stored and searched during recognition. Moreover, it leads to weaknesses in word or n-gram frequency estimates in language models. - This paper describes a novel approach to speech recognition with vocabularies that contain only the composing words of compounds. The recognition of a compound word is performed via a dedicated accessory language model that evaluates compound word hypotheses only. In this way, very large vocabularies (> 100,000 words) can be handled efficiently. In preliminary recognition tests, the model performed well.

Full Paper

Bibliographic reference.  Spies, Marcus (1995): "A language model for compound words in speech recognition", In EUROSPEECH-1995, 1767-1770.