First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

Speaker Independent Word Recognition System Based on the Structured Transition Network of Phonetic Segments

Nobuo Sugi (1), Jun'ichi Iwasaki (1), Hiroshi Matsu'ura (1), Tsuneo Nitta (1), Akira Fukumine (2), Akira Nakayama (2)

(1) Information & Communication Systems Laboratory, TOSHIBA Corporation, Kawasaki, Japan
(2) Toshiba Computer Engineering Corporation, Tokyo, Japan

This paper proposes a new word-recognition method based on the Structured Transition Networks (STN) with phonetic segments. Phonetic segments are multiple phonological units which consist of about 600 acoustic/phonetic structures of 32~96 msec duration. The STNs are state transition networks composed of a main path which represents a standard speech pattern and branches which represent distorted patterns. A flexible representation of speech fluctuation using these branches realizes a high rejection performance. The network design with the acoustic/phonetic knowledge requires a smaller amount of training data than do other statistical approaches. An evaluation of 16 spoken words uttered by 10 unknown speakers has achieved a recognition rate of 93.1%, and a rejection rate of 92.5% for the utterances outside the vocabulary.

Full Paper

Bibliographic reference.  Sugi, Nobuo / Iwasaki, Jun'ichi / Matsu'ura, Hiroshi / Nitta, Tsuneo / Fukumine, Akira / Nakayama, Akira (1990): "Speaker independent word recognition system based on the structured transition network of phonetic segments", In ICSLP-1990, 533-536.