Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges

Maxim Markitantov, Denis Dresvyanskiy, Danila Mamontov, Heysem Kaya, Wolfgang Minker, Alexey Karpov


This paper describes deep learning approaches for the Mask and Breathing Sub-Challenges (SCs), which are addressed by the INTERSPEECH 2020 Computational Paralinguistics Challenge. Motivated by outstanding performance of state-of-the-art end-to-end (E2E) approaches, we explore and compare effectiveness of different deep Convolutional Neural Network (CNN) architectures on raw data, log Mel-spectrograms, and Mel-Frequency Cepstral Coefficients. We apply a transfer learning approach to improve model’s efficiency and convergence speed. In the Mask SC, we conduct experiments with several pretrained CNN architectures on log-Mel spectrograms, as well as Support Vector Machines on baseline features. For the Breathing SC, we propose an ensemble deep learning system that exploits E2E learning and sequence prediction. The E2E model is based on 1D CNN operating on raw speech signals and is coupled with Long Short-Term Memory layers for sequence modeling. The second model works with log-Mel features and is based on a pretrained 2D CNN model stacked to Gated Recurrent Unit layers. To increase performance of our models in both SCs, we use ensembles of the best deep neural models obtained from N-fold cross-validation on combined challenge training and development datasets. Our results markedly outperform the challenge test set baselines in both SCs.


 DOI: 10.21437/Interspeech.2020-2666

Cite as: Markitantov, M., Dresvyanskiy, D., Mamontov, D., Kaya, H., Minker, W., Karpov, A. (2020) Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges. Proc. Interspeech 2020, 2072-2076, DOI: 10.21437/Interspeech.2020-2666.


@inproceedings{Markitantov2020,
  author={Maxim Markitantov and Denis Dresvyanskiy and Danila Mamontov and Heysem Kaya and Wolfgang Minker and Alexey Karpov},
  title={{Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2072--2076},
  doi={10.21437/Interspeech.2020-2666},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2666}
}