Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
This paper describes a speaker-independent word recognition method in noisy environments using dynamic and averaged spectral features of speech and neural networks. Spectral features of speech are obtained from a two-dimensional mel-cepstrum (TDMC). TDMC is defined as the two-dimensional Fourier transform of mel-frequency scaled log spectra in the frequency and time domains. In this paper, several regions of dynamic and averaged spectral features of TDMC word are used as training data of neural networks. Neural networks are feed-forward networks with three layers and learn automatically by a back propagation training algorithm. In order to improve the recognition performance in noisy environments, the learning order and SNR of the training data are considered in this study. Experimental results of speaker-independent word recognition for Japanese ten digits show that the proposed method gives better results especially in low SNR environments than a usual method.
Bibliographic reference. Kitamura, Tadashi / Ando, Satoshi / Hayahara, Etsuro (1992): "Speaker-independent spoken digit recognition in noisy environments using dynamic spectral features and neural networks", In ICSLP-1992, 699-702.