Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020

Peng Shen, Xugang Lu, Hisashi Kawai


In this paper, we describe the NICT speaker verification system for the text-independent task of the short-duration speaker verification (SdSV) challenge 2020. We firstly present the details of the training data and feature preparation. Then, x-vector-based front-ends by considering different network configurations, back-ends of probabilistic linear discriminant analysis (PLDA), simplified PLDA, cosine similarity, and neural network-based PLDA are investigated and explored. Finally, we apply a greedy fusion and calibration approach to select and combine the subsystems. To improve the performance of the speaker verification system on short-duration evaluation data, we introduce our investigations on how to reduce the duration mismatch between training and test datasets. Experimental results showed that our primary fusion yielded minDCF of 0.074 and EER of 1.50 on the evaluation subset, which was the 2nd best result in the text-independent speaker verification task.


 DOI: 10.21437/Interspeech.2020-2351

Cite as: Shen, P., Lu, X., Kawai, H. (2020) Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020. Proc. Interspeech 2020, 751-755, DOI: 10.21437/Interspeech.2020-2351.


@inproceedings{Shen2020,
  author={Peng Shen and Xugang Lu and Hisashi Kawai},
  title={{Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={751--755},
  doi={10.21437/Interspeech.2020-2351},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2351}
}