4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Speaker Recognition Model using Two-dimensional Mel-Cepstrum and Predictive Neural Network

Tadashi Kitamura, Shinsai Takei

Dept. of Intelligence and Computer Science, Nagoya Institute of Technology, Japan

This paper describes a speaker recognition model using Two-Dimensional Mel-Cepstrum and predictive neural network. The speaker model consists of two networks. The first one is a self-organizing VQ map (Kohonen's feature map). The second part is the predictive network and learns transitional patterns on the feature map of each speaker's model. TDMC consists of averaged features and dynamic features of the two-dimensional mel-log spectra in the analyzed interval. The measure for speaker recognition is obtained by using a combination of the VQ distortion on the feature map and the prediction error on the predictive network. In the study, text-independent speaker identification experiments for 8 speakers were carried out. The experimental results have shown that a combination of a feature map and a predictive network is very effective, and that the proposed model using TDMC shows the robustness for time interval.

Full Paper

Bibliographic reference.  Kitamura, Tadashi / Takei, Shinsai (1996): "Speaker recognition model using two-dimensional mel-cepstrum and predictive neural network", In ICSLP-1996, 1772-1775.