First International Conference on Spoken Language Processing (ICSLP 90)
This paper proposes a new word-recognition method based on the Structured Transition Networks (STN) with phonetic segments. Phonetic segments are multiple phonological units which consist of about 600 acoustic/phonetic structures of 32~96 msec duration. The STNs are state transition networks composed of a main path which represents a standard speech pattern and branches which represent distorted patterns. A flexible representation of speech fluctuation using these branches realizes a high rejection performance. The network design with the acoustic/phonetic knowledge requires a smaller amount of training data than do other statistical approaches. An evaluation of 16 spoken words uttered by 10 unknown speakers has achieved a recognition rate of 93.1%, and a rejection rate of 92.5% for the utterances outside the vocabulary.
Bibliographic reference. Sugi, Nobuo / Iwasaki, Jun'ichi / Matsu'ura, Hiroshi / Nitta, Tsuneo / Fukumine, Akira / Nakayama, Akira (1990): "Speaker independent word recognition system based on the structured transition network of phonetic segments", In ICSLP-1990, 533-536.