Several Time-Delay Neural Network(TDNN) architectures applied to speaker-dependent and multi-speaker's phoneme recognition are compared with respect to their capabilities on a speaker-independent phoneme recognition problem. Phoneme experiments for recognizing voiced stops /b, d, g/ using six and twelve training speakers showed high average recognition rates of 91. 3% and 93. 6%, respectively for eight test speakers. In addition, constructing networks by speakers' modules is effective in terms of saving training time, and leads to higher recognition performance than a single structure of TDNN with comparable network capacity. Furthermore, we propose an extended architecture for recognizing all phonemes based on the achievements in this paper.
Bibliographic reference. Sawai, Hidefumi / Nakamura, Satoru (1991): "Time-delay neural network architectures for high-performance speaker-independent recognition", In EUROSPEECH-1991, 1011-1014.