Speech Emotion Recognition Based on Multi-Label Emotion Existence Model

Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono

This paper presents a novel speech emotion recognition method that addresses the ambiguous nature of emotions in speech. Most conventional methods assume there is only a single ground truth, the dominant emotion, though utterances can contain multiple emotions. In order to solve this problem, several methods that consider ambiguous emotions (e.g. soft-target training) have been proposed. Unfortunately, training them is difficult since they work by estimating the proportions of all emotions. The proposed method improves both frameworks by evaluating the presence or absence of each emotion. We expect that it is much easier to estimate just presence/absence of emotions rather than trying to determine proportions of each, and the deliberate assessment of emotion existence information will help to estimate the proportion of each or dominant class more precisely. The proposed method employs two-step training. Multi-Label Emotion Existence (MLEE) model is trained first to estimate whether each emotion is present or absent. Then, the dominant emotion recognition model with hard- or soft-target labels is trained by means of the intermediate outputs of the MLEE model so as to utilize cues of emotion existence for inferring the dominant. Experiments demonstrate that the proposed method outperforms both hard- or soft-target based conventional emotion recognition schemes.

 DOI: 10.21437/Interspeech.2019-2524

Cite as: Ando, A., Masumura, R., Kamiyama, H., Kobashikawa, S., Aono, Y. (2019) Speech Emotion Recognition Based on Multi-Label Emotion Existence Model. Proc. Interspeech 2019, 2818-2822, DOI: 10.21437/Interspeech.2019-2524.

  author={Atsushi Ando and Ryo Masumura and Hosana Kamiyama and Satoshi Kobashikawa and Yushi Aono},
  title={{Speech Emotion Recognition Based on Multi-Label Emotion Existence Model}},
  booktitle={Proc. Interspeech 2019},