The XMUSPEECH System for Short-Duration Speaker Verification Challenge 2020

Tao Jiang, Miao Zhao, Lin Li, Qingyang Hong


In this paper, we present our XMUSPEECH system for Task 1 in the Short-duration Speaker Verification (SdSV) Challenge. In this challenge, Task 1 is a Text-Dependent (TD) mode where speaker verification systems are required to automatically determine whether a test segment with specific phrase belongs to the target speaker. We leveraged the system pipeline from three aspects, including the data processing, front-end training and back-end processing. In addition, we have explored some training strategies such as spectrogram augmentation and transfer learning. The experimental results show that the attempts we had done are effective and our best single system, a transferred model with spectrogram augmentation and attentive statistic pooling, significantly outperforms the official baseline on both progress subset and evaluation subset. Finally, a fusion of seven subsystems are chosen as our primary system which yielded 0.0856 and 0.0862 in term of minDCF, for the progress subset and evaluation subset respectively.


 DOI: 10.21437/Interspeech.2020-1704

Cite as: Jiang, T., Zhao, M., Li, L., Hong, Q. (2020) The XMUSPEECH System for Short-Duration Speaker Verification Challenge 2020. Proc. Interspeech 2020, 736-740, DOI: 10.21437/Interspeech.2020-1704.


@inproceedings{Jiang2020,
  author={Tao Jiang and Miao Zhao and Lin Li and Qingyang Hong},
  title={{The XMUSPEECH System for Short-Duration Speaker Verification Challenge 2020}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={736--740},
  doi={10.21437/Interspeech.2020-1704},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1704}
}