Attention-based Sequence Classification for Affect Detection

Cristina Gorrostieta, Richard Brutti, Kye Taylor, Avi Shapiro, Joseph Moran, Ali Azarbayejani, John Kane

This paper presents the Cogito submission to the Interspeech Computational Paralinguistics Challenge (ComParE), for the second sub-challenge. The aim of this second sub-challenge is to recognize self-assessed affect from short clips of speech-containing audio data. We adopt a sequence classification-based approach where we use a long-short term memory (LSTM) network for modeling the evolution of low-level spectral coefficients, with added attention mechanism to emphasize salient regions of the audio clip. Additionally to deal with the underrepresentation of the negative valence class we use a combination of mitigation strategies including oversampling and loss function weighting. Our experiments demonstrate improvements in detection accuracy when including the attention mechanism and class balancing strategies in combination, with the best models outperforming the best single challenge baseline model.

 DOI: 10.21437/Interspeech.2018-1610

Cite as: Gorrostieta, C., Brutti, R., Taylor, K., Shapiro, A., Moran, J., Azarbayejani, A., Kane, J. (2018) Attention-based Sequence Classification for Affect Detection. Proc. Interspeech 2018, 506-510, DOI: 10.21437/Interspeech.2018-1610.

  author={Cristina Gorrostieta and Richard Brutti and Kye Taylor and Avi Shapiro and Joseph Moran and Ali Azarbayejani and John Kane},
  title={Attention-based Sequence Classification for Affect Detection},
  booktitle={Proc. Interspeech 2018},