First International Conference on Spoken Language Processing (ICSLP 90)
This paper describes a speaker-independent word recognition method in noisy environments using dynamic and averaged spectral features based on a two-dimensional mel-cepstrum (TDMC). A TDMC is defined as the two-dimensional Fourier transform of mel-frequency scaled logarithm spectra in the frequency and time domains, and it consists of averaged and dynamic features of the two-dimensional mel-log spectrum in the analyzed interval. This method uses distance measures based on averaged and dynamic spectral features of the TDMC of the analyzed word. Furthermore, one of noise-added reference pattern sets is used to improve this method. Speaker-independent word recognition experiments, in white noise and colored noise, for 10 Japanese digits uttered by ten male speakers show the effectiveness of this method. By using a reference pattern set of 20 dB this method gives the recognition error rate lower than 5% and 2% for white-noise-added speech and colored-noise-added speech of 20 and 10 dB SNR, respectively. This method gives better recognition rates than a standard one using a one-dimensional mel-cepstrum representing instantaneous spectral features.
Bibliographic reference. Kitamura, Tadashi / Hayahara, Etsuro / Simazciki, Yasuhiko (1990): "Speaker-independent word recogniton in noisy environments using dynamic and averaged spectral features based on a two-dimensional mel-cepstrum", In ICSLP-1990, 1129-1132.