EUROSPEECH 2001 Scandinavia
This paper describes a GMM-based speaker verification system that uses speaker-dependent background models transformed by speaker-specific maximum likelihood linear transforms to achieve a sharper separation between the target and the nontarget acoustic region. The effect of tying, or coupling, Gaussian components between the target and the background model is studied and shown to be a relevant factor with respect to the desired operating point. A fusion of scores from multiple systems built on different acoustic features via a neural network with performance gains over linear combination is also presented. Results obtained on the 1999 speaker recognition evaluation set indicate reductions of the minimum detection cost of up to 13% and 25% for all tests and electret-only tests respectively, as compared to a baseline GMM system. The neural fusion of three systems gains further 5% cost reduction.
Bibliographic reference. Navratil, Jiri / Chaudhari, Upendra V. / Ramaswamy, Ganesh N. (2001): "Speaker verification using target and background dependent linear transforms and multi-system fusion", In EUROSPEECH-2001, 1389-1392.