Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network

Bo-Hao Su, Chun-Min Chang, Yun-Shao Lin, Chi-Chun Lee


The manner that human encodes emotion information within an utterance is often complex and could result in a diverse salient acoustic profile that is conditioned on emotion types. In this work, we propose a framework in imposing a graph attention mechanism on gated recurrent unit network (GA-GRU) to improve utterance-based speech emotion recognition (SER). Our proposed GA-GRU combines both long-range time-series based modeling of speech and further integrates complex saliency using a graph structure. We evaluate our proposed GA-GRU on the IEMOCAP and the MSP-IMPROV database and achieve a 63.8% UAR and 57.47% UAR in a four class emotion recognition task. The GA-GRU obtains consistently better performances as compared to recent state-of-art in per-utterance emotion classification model, and we further observe that different emotion categories would require distinct flexible structures in modeling emotion information in the acoustic data that is beyond conventional left-to-right or vice versa.


 DOI: 10.21437/Interspeech.2020-1733

Cite as: Su, B., Chang, C., Lin, Y., Lee, C. (2020) Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network. Proc. Interspeech 2020, 506-510, DOI: 10.21437/Interspeech.2020-1733.


@inproceedings{Su2020,
  author={Bo-Hao Su and Chun-Min Chang and Yun-Shao Lin and Chi-Chun Lee},
  title={{Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={506--510},
  doi={10.21437/Interspeech.2020-1733},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1733}
}