Improving X-Vector and PLDA for Text-Dependent Speaker Verification

Zhuxin Chen, Yue Lin

Recently, the pipeline consisting of an x-vector speaker embedding front-end and a Probabilistic Linear Discriminant Analysis (PLDA) back-end has achieved state-of-the-art results in text-independent speaker verification. In this paper, we further improve the performance of x-vector and PLDA based system for text-dependent speaker verification by exploring the choice of layer to produce embedding and modifying the back-end training strategies. In particular, we probe that x-vector based embeddings, specifically the standard deviation statistics in the pooling layer, contain the information related to both speaker characteristics and spoken content. Accordingly, we modify the back-end training labels by utilizing both of the speaker-id and phrase-id. A correlation-alignment-based PLDA adaptation is also adopted to make use of the text-independent labeled data during back-end training. Experimental results on the SDSVC 2020 dataset show that our proposed methods achieve significant performance improvement compared with the x-vector and HMM based i-vector baselines.

 DOI: 10.21437/Interspeech.2020-1188

Cite as: Chen, Z., Lin, Y. (2020) Improving X-Vector and PLDA for Text-Dependent Speaker Verification. Proc. Interspeech 2020, 726-730, DOI: 10.21437/Interspeech.2020-1188.

  author={Zhuxin Chen and Yue Lin},
  title={{Improving X-Vector and PLDA for Text-Dependent Speaker Verification}},
  booktitle={Proc. Interspeech 2020},