Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition

Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang, Zhanlei Yang, Rongjun Li


Emotion recognition remains a complex task due to speaker variations and low-resource training samples. To address these difficulties, we focus on the domain adversarial neural networks (DANN) for emotion recognition. The primary task is to predict emotion labels. The secondary task is to learn a common representation where speaker identities can not be distinguished. By using this approach, we bring the representations of different speakers closer. Meanwhile, through using the unlabeled data in the training process, we alleviate the impact of low-resource training samples. In the meantime, prior work found that contextual information and multimodal features are important for emotion recognition. However, previous DANN based approaches ignore these information, thus limiting their performance. In this paper, we propose the context-dependent domain adversarial neural network for multimodal emotion recognition. To verify the effectiveness of our proposed method, we conduct experiments on the benchmark dataset IEMOCAP. Experimental results demonstrate that the proposed method shows an absolute improvement of 3.48% over state-of-the-art strategies.


 DOI: 10.21437/Interspeech.2020-1705

Cite as: Lian, Z., Tao, J., Liu, B., Huang, J., Yang, Z., Li, R. (2020) Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition. Proc. Interspeech 2020, 394-398, DOI: 10.21437/Interspeech.2020-1705.


@inproceedings{Lian2020,
  author={Zheng Lian and Jianhua Tao and Bin Liu and Jian Huang and Zhanlei Yang and Rongjun Li},
  title={{Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={394--398},
  doi={10.21437/Interspeech.2020-1705},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1705}
}