4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Language Modeling by String Pattern N-gram for Japanese Speech Recognition

Akinori Ito, Masaki Kohda

Yamagata University, Yonezawa, Japan

This paper describes a new powerful statistical language model based on N-gram model for Japanese speech recognition. In English, a sentence is written word-by-word. On the other hand, a sentence in Japanese has no word boundary character. Therefore, a Japanese sentence requires word segmentation by morphemic analysis before the construction of word N-gram. We propose an N-gram based language model which requires no word segmentation. This model uses character string patterns as units of N-gram. The string patterns are chosen from the training text according to a statistical criterion. We carried out several experiments to compare perplexities of the proposed and the conventional models, which showed the advantage of our model. For many of the readers' interest, we applied this method to English text. As the result of a preliminary experiment, the proposed method got better performance than conventional word trigram.

Full Paper

Bibliographic reference.  Ito, Akinori / Kohda, Masaki (1996): "Language modeling by string pattern n-gram for Japanese speech recognition", In ICSLP-1996, 490-493.