Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization

Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck


In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The main difficulty of the challenge exists in the large degree of varying phonetic overlap between the potentially cross-lingual trials, along with the limited availability of in-domain DeepMine Farsi training data. We introduce domain-balanced hard prototype mining to finetune the state-of-the-art ECAPA-TDNN x-vector based speaker embedding extractor. The sample mining technique efficiently exploits speaker distances between the speaker prototypes of the popular AAM-softmax loss function to construct challenging training batches that are balanced on the domain-level. To enhance the scoring of cross-lingual trials, we propose a language-dependent s-norm score normalization. The imposter cohort only contains data from the Farsi target-domain which simulates the enrollment data always being Farsi. In case a Gaussian-Backend language model detects the test speaker embedding to contain English, a cross-language compensation offset determined on the AAM-softmax speaker prototypes is subtracted from the maximum expected imposter mean score. A fusion of five systems with minor topological tweaks resulted in a final MinDCF and EER of 0.065 and 1.45% respectively on the SdSVC evaluation set.


 DOI: 10.21437/Interspeech.2020-2662

Cite as: Thienpondt, J., Desplanques, B., Demuynck, K. (2020) Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization. Proc. Interspeech 2020, 756-760, DOI: 10.21437/Interspeech.2020-2662.


@inproceedings{Thienpondt2020,
  author={Jenthe Thienpondt and Brecht Desplanques and Kris Demuynck},
  title={{Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={756--760},
  doi={10.21437/Interspeech.2020-2662},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2662}
}