Compensation for Domain Mismatch in Text-independent Speaker Recognition

Fahimeh Bahmaninezhad, John H.L. Hansen

Domain mismatch continues to be a major research challenge for speaker recognition in naturalistic audio streams. This study presents a new technique for domain mismatch compensation within a text-independent speaker recognition scenario. The proposed method is designed for the NIST speaker recognition evaluation 2016 (SRE16) task, where speakers from training, development and evaluation data belong to different sets of languages. An i-vector/PLDA speaker recognition system is adopted for this study. To address the mismatch problem, we propose to append auxiliary features to the i-vectors. These auxiliary features are adapted representations of the i-vectors to the specific in-domain data; therefore, the new feature vector has two parts: (1) i-vectors which represent speaker identity and (2) auxiliary features which are representations of i-vectors in the in-domain data feature space (and may not contain speaker identity information). This new concatenated feature vector (we call this a-vector) is then post-processed with support vector discriminant analysis (SVDA) for further domain compensation. Evaluations based on the SRE16 confirm the effectiveness of the proposed technique. In terms of minimum Cprimary cost, a-vector outperforms the i-vector consistently. Moreover, comparing to previous single systems introduced for SRE16, we achieved 8.5%-18% improvements in terms of equal error rate.

 DOI: 10.21437/Interspeech.2018-1446

Cite as: Bahmaninezhad, F., Hansen, J.H. (2018) Compensation for Domain Mismatch in Text-independent Speaker Recognition. Proc. Interspeech 2018, 1071-1075, DOI: 10.21437/Interspeech.2018-1446.

  author={Fahimeh Bahmaninezhad and John H.L. Hansen},
  title={Compensation for Domain Mismatch in Text-independent Speaker Recognition},
  booktitle={Proc. Interspeech 2018},