Dynamic Margin Softmax Loss for Speaker Verification

Dao Zhou, Longbiao Wang, Kong Aik Lee, Yibo Wu, Meng Liu, Jianwu Dang, Jianguo Wei


We propose a dynamic-margin softmax loss for the training of deep speaker embedding neural network. Our proposal is inspired by the additive-margin softmax (AM-Softmax) loss reported earlier. In AM-Softmax loss, a constant margin is used for all training samples. However, the angle between the feature vector and the ground-truth class center is rarely the same for all samples. Furthermore, the angle also changes during training. Thus, it is more reasonable to set a dynamic margin for each training sample. In this paper, we propose to dynamically set the margin of each training sample commensurate with the cosine angle of that sample, hence, the name dynamic-additive-margin softmax (DAM-Softmax) loss. More specifically, the smaller the cosine angle is, the larger the margin between the training sample and the corresponding class in the feature space should be to promote intra-class compactness. Experimental results show that the proposed DAM-Softmax loss achieves state-of-the-art performance on the VoxCeleb dataset by 1.94% in equal error rate (EER). In addition, our method also outperforms AM-Softmax loss when evaluated on the Speakers in the Wild (SITW) corpus.


 DOI: 10.21437/Interspeech.2020-1106

Cite as: Zhou, D., Wang, L., Lee, K.A., Wu, Y., Liu, M., Dang, J., Wei, J. (2020) Dynamic Margin Softmax Loss for Speaker Verification. Proc. Interspeech 2020, 3800-3804, DOI: 10.21437/Interspeech.2020-1106.


@inproceedings{Zhou2020,
  author={Dao Zhou and Longbiao Wang and Kong Aik Lee and Yibo Wu and Meng Liu and Jianwu Dang and Jianguo Wei},
  title={{Dynamic Margin Softmax Loss for Speaker Verification}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3800--3804},
  doi={10.21437/Interspeech.2020-1106},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1106}
}