Angular Margin Centroid Loss for Text-Independent Speaker Recognition

Yuheng Wei, Junzhao Du, Hui Liu

Speaker recognition for unseen speakers out of the training dataset relies on the discrimination of speaker embedding. Recent studies use the angular softmax losses with angular margin penalties to enhance the intra-class compactness of speaker embedding, which achieve obvious performance improvement. However, the classification layer encounters the problem of dimension explosion in these losses with the growth of training speakers. In this paper, like the prototype network loss in the few-short learning and the generalized end-to-end loss, we optimize the cosine distances between speaker embeddings and their corresponding centroids rather than the weight vectors in the classification layer. For the intra-class compactness, we impose the additive angular margin to shorten the cosine distance between speaker embeddings belonging to the same speaker. Meanwhile, we also explicitly improve the inter-class separability by enlarging the cosine distance between different speaker centroids. Experiments show that our loss achieves comparable performance with the stat-of-the-art angular margin softmax loss in both verification and identification tasks and markedly reduces the training iterations.

 DOI: 10.21437/Interspeech.2020-2538

Cite as: Wei, Y., Du, J., Liu, H. (2020) Angular Margin Centroid Loss for Text-Independent Speaker Recognition. Proc. Interspeech 2020, 3820-3824, DOI: 10.21437/Interspeech.2020-2538.

  author={Yuheng Wei and Junzhao Du and Hui Liu},
  title={{Angular Margin Centroid Loss for Text-Independent Speaker Recognition}},
  booktitle={Proc. Interspeech 2020},