4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

A Category Based Approach for Recognition of Out-of-Vocabulary Words

Florian Gallwitz, Elmar Nöth, Heinrich Niemann

Lehrstuhl für Mustererkennung (Informatik 5), Universität Erlangen/Nürnberg, Erlangen, Germany

In almost all applications of automatic speech recognition, especially in spontaneous speech tasks, the recognizer vocabulary cannot cover all occurring words. There is always a significant amount of out-of-vocabulary words even when the vocabulary size is very large. In this paper we present a new approach for the integration of out-of-vocabulary words into statistical language models. We use category information for all words in the training corpus to define a function that gives an approximation of the out-of-vocabulary word emission probability for each word category. This information is integrated into the language models. Although we use a simple acoustic model for out-of-vocabulary words, we achieve a 6% reduction of word error rate on spontaneous speech data with about 5% out-of-vocabulary rate.

Full Paper

Bibliographic reference.  Gallwitz, Florian / Nöth, Elmar / Niemann, Heinrich (1996): "A category based approach for recognition of out-of-vocabulary words", In ICSLP-1996, 228-231.