EUROSPEECH 2001 Scandinavia
Gaussian Mixture Models (GMM) are highly suitable for speaker identification and verification. Nevertheless these models try to represent primarily the distribution of the available training data - neglecting any possible phonetic information which might be of worth. In our paper we present a recognition system using multiple speaker GMMs based on phonetic classes. By introducing 'phonetic' mixture coefficients a weighting of phoneme classes with respect to speaker recognizability can be achieved. The implicit integration in the probability computation avoids the need for a phonetic labeling during recognition. The mixture weights can be learned in a training phase. Model training was examined applying MAP enrolment and the recently reported Eigenvoice approach. Especially for the latter a phonetic separation is advantageous. Recognition error reductions up to 15% relatively were achieved. Furthermore, the multiple GMM approach is particularly effective for speaker enrolment with sparse training data.
Bibliographic reference. Faltlhauser, Robert / Ruske, Günther (2001): "Improving speaker recognition using phonetically structured Gaussian mixture models", In EUROSPEECH-2001, 751-754.