EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Structured Language Model for Class Identification of Out-Of-Vocabulary Words Arising from Multiple Wordclasses

Shigehiko Onishi, Hirofumi Yamamoto, Yoshinori Sagisaka

ATR Spoken Language Translation Research Laboratories, Japan

A structured language model (STLM) is proposed to cope with out-of-vocabulary (OOV) words coming from multiple word-classes. The STLM aims at independently modeling the classes without interference and identifying the class of words arising from multiple word-classes. The STLM consists of the conventional word-class N-gram and the sets of the independent-trained class-specific sub-word N-grams. We made an experimental language model by using STLM for the two similar proper-noun classes and performed the speech recognition experiments. The results show that any OOV word of the one class is never misrecognized as that of the other class. The results show that the STLM could integrate the multiple different statistical language models with no interference.

Full Paper

Bibliographic reference.  Onishi, Shigehiko / Yamamoto, Hirofumi / Sagisaka, Yoshinori (2001): "Structured language model for class identification of out-of-vocabulary words arising from multiple wordclasses", In EUROSPEECH-2001, 693-696.