EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Improving Speaker Recognition Using Phonetically Structured Gaussian Mixture Models

Robert Faltlhauser, Günther Ruske

Technische Universität München, Germany

Gaussian Mixture Models (GMM) are highly suitable for speaker identification and verification. Nevertheless these models try to represent primarily the distribution of the available training data - neglecting any possible phonetic information which might be of worth. In our paper we present a recognition system using multiple speaker GMMs based on phonetic classes. By introducing 'phonetic' mixture coefficients a weighting of phoneme classes with respect to speaker recognizability can be achieved. The implicit integration in the probability computation avoids the need for a phonetic labeling during recognition. The mixture weights can be learned in a training phase. Model training was examined applying MAP enrolment and the recently reported Eigenvoice approach. Especially for the latter a phonetic separation is advantageous. Recognition error reductions up to 15% relatively were achieved. Furthermore, the multiple GMM approach is particularly effective for speaker enrolment with sparse training data.

Full Paper

Bibliographic reference.  Faltlhauser, Robert / Ruske, Günther (2001): "Improving speaker recognition using phonetically structured Gaussian mixture models", In EUROSPEECH-2001, 751-754.