Content Normalization for Text-Dependent Speaker Verification

Subhadeep Dey, Srikanth Madikeri, Petr Motlicek, Marc Ferras


Subspace based techniques, such as i-vector and Joint Factor Analysis (JFA) have shown to provide state-of-the-art performance for fixed phrase based text-dependent speaker verification. However, the error rates of such systems on the random digit task of RSR dataset are higher than that of Gaussian Mixture Model-Universal Background Model (GMM-UBM). In this paper, we aim at improving i-vector system by normalizing the content of the enrollment data to match the test data. We estimate i-vectors for each frames of a speech utterance (also called online i-vectors). The largest similarity scores across frames between enrollment and test are taken using these online i-vectors to obtain speaker verification scores. Experiments on Part3 of RSR corpora show that the proposed approach achieves 12% relative improvement in equal error rate over a GMM-UBM based baseline system.


 DOI: 10.21437/Interspeech.2017-1419

Cite as: Dey, S., Madikeri, S., Motlicek, P., Ferras, M. (2017) Content Normalization for Text-Dependent Speaker Verification. Proc. Interspeech 2017, 1482-1486, DOI: 10.21437/Interspeech.2017-1419.


@inproceedings{Dey2017,
  author={Subhadeep Dey and Srikanth Madikeri and Petr Motlicek and Marc Ferras},
  title={Content Normalization for Text-Dependent Speaker Verification},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1482--1486},
  doi={10.21437/Interspeech.2017-1419},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1419}
}