EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


A Segmental Mixture Model for Speaker Recognition

Robert P. Stapert, John S. Mason

University of Wales Swansea, UK

Standard Gaussian mixture modelling does not possess time sequence information (TSI) other than that which might be embedded in the acoustic features. Dynamic time warping relates directly to TSI, time-warping two sequences of features into alignment. Here, a hybrid system embedding DTW into a GMM is presented. Improved automatic speaker verification performance is demonstrated. Testing 1000 speakers in a fully text independent, world-model-adapted mode shows an equal error improvement over a standard GMM from 4.1% to 3.8%.

Full Paper

Bibliographic reference.  Stapert, Robert P. / Mason, John S. (2001): "A segmental mixture model for speaker recognition", In EUROSPEECH-2001, 2509-2512.