Speech Emotion Recognition with Discriminative Feature Learning

Huan Zhou, Kai Liu


The performance of a speech emotion recognition (SER) system heavily relies on the deep feature learned from the speeches. Most state of the art has focused on developing various deep architectures for effective feature learning. In this study, we make the first attempt to explore feature discriminability instead. Based on our SER baseline system, we propose three approaches, two on loss functions and one on combined attentive pooling, to enhance feature discriminability. Evaluations on IEMOCAP database consistently validate the effectiveness of all our proposals. Compared to the baseline system, the proposed three systems demonstrated at least +4.0% absolute improvements in accuracy, with no increment in the total number of parameters.


 DOI: 10.21437/Interspeech.2020-2237

Cite as: Zhou, H., Liu, K. (2020) Speech Emotion Recognition with Discriminative Feature Learning. Proc. Interspeech 2020, 4094-4097, DOI: 10.21437/Interspeech.2020-2237.


@inproceedings{Zhou2020,
  author={Huan Zhou and Kai Liu},
  title={{Speech Emotion Recognition with Discriminative Feature Learning}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4094--4097},
  doi={10.21437/Interspeech.2020-2237},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2237}
}