Meta Multi-Task Learning for Speech Emotion Recognition

Ruichu Cai, Kaibin Guo, Boyan Xu, Xiaoyan Yang, Zhenjie Zhang

Most existing Speech Emotion Recognition (SER) approaches ignore the relationship between the categorical emotional labels and the dimensional labels in valence, activation or dominance space. Although multi-task learning has recently been introduced to explore such auxiliary tasks of SER, existing approaches only share the feature extractor under the traditional multi-task learning framework and can not efficiently transfer the knowledge from the auxiliary tasks to the target task. In order to address these issues, we propose a Meta Multi-task Learning method for SER by combining the multi-task learning with meta learning. Our contributions include: 1) to model the relationship among auxiliary tasks, we extend the task generation of meta learning to the form of multiple tasks, and 2) to transfer the knowledge from the auxiliary tasks to the target task, we propose a tuning-based transfer training mechanism in the meta learning framework. The experiments on IEMOCAP show that our approach outperforms the state-of-the-art solution (UA: 70.32%, WA: 76.64%).

 DOI: 10.21437/Interspeech.2020-2624

Cite as: Cai, R., Guo, K., Xu, B., Yang, X., Zhang, Z. (2020) Meta Multi-Task Learning for Speech Emotion Recognition. Proc. Interspeech 2020, 3336-3340, DOI: 10.21437/Interspeech.2020-2624.

  author={Ruichu Cai and Kaibin Guo and Boyan Xu and Xiaoyan Yang and Zhenjie Zhang},
  title={{Meta Multi-Task Learning for Speech Emotion Recognition}},
  booktitle={Proc. Interspeech 2020},