On Parameter Adaptation in Softmax-Based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-Based Speaker Recognition

Magdalena Rybicka, Konrad Kowalczyk


In various classification tasks the major challenge is in generating discriminative representation of classes. By proper selection of deep neural network (DNN) loss function we can encourage it to produce embeddings with increased inter-class separation and smaller intra-class distances. In this paper, we develop softmax-based cross-entropy loss function which adapts its parameters to the current training phase. The proposed solution improves accuracy up to 24% in terms of Equal Error Rate (EER) and minimum Detection Cost Function (minDCF). In addition, our proposal also accelerates network convergence compared with other state-of-the-art softmax-based losses. As an additional contribution of this paper, we adopt and subsequently modify the ResNet DNN structure for the speaker recognition task. The proposed ResNet network achieves relative gains of up to 32% and 15% in terms of EER and minDCF respectively, compared with the well-established Time Delay Neural Network (TDNN) architecture for x-vector extraction.


 DOI: 10.21437/Interspeech.2020-2264

Cite as: Rybicka, M., Kowalczyk, K. (2020) On Parameter Adaptation in Softmax-Based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-Based Speaker Recognition. Proc. Interspeech 2020, 3805-3809, DOI: 10.21437/Interspeech.2020-2264.


@inproceedings{Rybicka2020,
  author={Magdalena Rybicka and Konrad Kowalczyk},
  title={{On Parameter Adaptation in Softmax-Based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-Based Speaker Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3805--3809},
  doi={10.21437/Interspeech.2020-2264},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2264}
}