Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition

Kusha Sridhar, Carlos Busso


Reliable and generalizable speech emotion recognition (SER) systems have wide applications in various fields including healthcare, customer service, and security and defense. Towards this goal, this study presents a novel teacher-student (T-S) framework for SER, relying on an ensemble of probabilistic predictions of teacher embeddings to train an ensemble of students. We use uncertainty modeling with Monte-Carlo (MC) dropout to create a distribution for the embeddings of an intermediate dense layer of the teacher. The embeddings guiding the student models are derived by sampling from this distribution. The final prediction combines the results obtained by the student ensemble. The proposed model not only increases the prediction performance over the teacher model, but also generates more consistent predictions. As a T-S formulation, the approach allows the use of unlabeled data to improve the performance of the students in a semi-supervised manner. An ablation analysis shows the importance of the MC-based ensemble and the use of unlabeled data. The results show relative improvements in concordance correlation coefficient (CCC) up to 4.25% for arousal, 2.67% for valence and 4.98% for dominance from their baseline results. The results also show that the student ensemble decreases the uncertainty in the predictions, leading to more consistent results.


 DOI: 10.21437/Interspeech.2020-2694

Cite as: Sridhar, K., Busso, C. (2020) Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition. Proc. Interspeech 2020, 516-520, DOI: 10.21437/Interspeech.2020-2694.


@inproceedings{Sridhar2020,
  author={Kusha Sridhar and Carlos Busso},
  title={{Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={516--520},
  doi={10.21437/Interspeech.2020-2694},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2694}
}