EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Learning Units for Domain-Independent Out-of- Vocabulary Word Modelling

Issam Bazzi, James Glass

MIT Laboratory for Computer Science, USA

This paper describes our recent work on detecting and recognizing out-of-vocabulary (OOV) words for robust speech recognition and understanding. To allow for OOV recognition within a word-based recognizer, the in-vocabulary (IV) word network is augmented with an OOV model so that OOV words are considered simultaneously with IV words during recognition. We explore several configurations for the OOV model, the best of which utilizes a set of domain-independent, automatically derived, variable-length units. The units are created using an iterative bottom-up procedure where, at each iteration, the unit pairs with maximum mutual information are merged. When evaluating this method on a weather information domain, the false alarm rate of our baseline OOV model is reduced by over 60%. For example, with an OOV detection rate of 70%, the OOV false alarm rate is reduced from 8.5% to 3.2%, with only 3% relative degradation in word error rate on IV data.

Full Paper

Bibliographic reference.  Bazzi, Issam / Glass, James (2001): "Learning units for domain-independent out-of- vocabulary word modelling", In EUROSPEECH-2001, 61-64.