Second European Conference on Speech Communication and Technology

Genova, Italy
September 24-26, 1991


Time-Delay Neural Network Architectures for High-Performance Speaker-Independent Recognition

Hidefumi Sawai (1,2), Satoru Nakamura (3)

(1) ATR Interpreting Telephony Research Laboratories, Kyoto, Japan
(2) Ricoh Co. , Ltd. , Yokohama, Japan
(3) Faculty of Science and Technology, Keio University, Japan

Several Time-Delay Neural Network(TDNN) architectures applied to speaker-dependent and multi-speaker's phoneme recognition are compared with respect to their capabilities on a speaker-independent phoneme recognition problem. Phoneme experiments for recognizing voiced stops /b, d, g/ using six and twelve training speakers showed high average recognition rates of 91. 3% and 93. 6%, respectively for eight test speakers. In addition, constructing networks by speakers' modules is effective in terms of saving training time, and leads to higher recognition performance than a single structure of TDNN with comparable network capacity. Furthermore, we propose an extended architecture for recognizing all phonemes based on the achievements in this paper.

Full Paper

Bibliographic reference.  Sawai, Hidefumi / Nakamura, Satoru (1991): "Time-delay neural network architectures for high-performance speaker-independent recognition", In EUROSPEECH-1991, 1011-1014.