5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

A Discriminative Training Algorithm for Gaussian Mixture Speaker Models

Jialong He, Li Liu, GŁnther Palm

Abteilung Neuroinformatik, University of Ulm, Germany

The Gaussian mixture speaker model (GMM) is usually trained with the expectation-maximization (EM) algorithm to maximize the likelihood (ML) of observation data from an individual class. The GMM trained based the ML criterion has weak discriminative power when used as a classifier. In this paper, a discriminative training procedure is proposed to fine-tune the parameters in the GMMs. The goal of the training is to reduce the number of misclassified vector groups. Since a vector group can be thought as derived from a short sentence, this training procedure optimize the speaker identification performance more directly. Even though the algorithm itself is based on an heuristic idea, it works fine for many practical problems. Besides, the training speed is very fast. In an evaluation experiment with the YOHO database, when each speaker is modeled with 8 mixtures, the identification rate increases from 83.8% to 92.4% after applying this discriminative training algorithm.

Full Paper

Bibliographic reference.  He, Jialong / Liu, Li / Palm, GŁnther (1997): "A discriminative training algorithm for Gaussian mixture speaker models", In EUROSPEECH-1997, 959-962.