Exploration of Acoustic and Lexical Cues for the INTERSPEECH 2020 Computational Paralinguistic Challenge

Ziqing Yang, Zifan An, Zehao Fan, Chengye Jing, Houwei Cao


In this paper, we investigate various acoustic features and lexical features for the INTERSPEECH 2020 Computational Paralinguistic Challenge. For the acoustic analysis, we show that the proposed FV-MFCC feature is very promising, which has very strong prediction power on its own, and can also provide complementary information when fused with other acoustic features. For the lexical representation, we find that the corpus-dependent TF.IDF feature is by far the best representation. We also explore several model fusion techniques to combine different modalities together, and propose novel SVM models to aggregate the chunk-level predictions to the narrative-level predictions based on the chunk-level decision functionals. Finally we discuss the potential for improving prediction by combining the lexical and acoustic modalities together, and we find that fusion of lexical and acoustic modalities do not lead to consistent improvements over elderly Arousal, but substantially improve over the Valence. Our methods significantly outperform the official baselines on the test set in the participated Mask and Elderly Sub-challenges. We obtain an UAR of 75.1%, 54.3%, and 59.0% on the Mask, Elderly Arousal and Valence prediction tasks respectively.


 DOI: 10.21437/Interspeech.2020-2999

Cite as: Yang, Z., An, Z., Fan, Z., Jing, C., Cao, H. (2020) Exploration of Acoustic and Lexical Cues for the INTERSPEECH 2020 Computational Paralinguistic Challenge. Proc. Interspeech 2020, 2092-2096, DOI: 10.21437/Interspeech.2020-2999.


@inproceedings{Yang2020,
  author={Ziqing Yang and Zifan An and Zehao Fan and Chengye Jing and Houwei Cao},
  title={{Exploration of Acoustic and Lexical Cues for the INTERSPEECH 2020 Computational Paralinguistic Challenge}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2092--2096},
  doi={10.21437/Interspeech.2020-2999},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2999}
}