Role of Regularization in the Prediction of Valence from Speech

Kusha Sridhar, Srinivas Parthasarathy, Carlos Busso

Regularization plays a key role in improving the prediction of emotions using attributes such as arousal, valence and dominance. Regularization is particularly important with deep neural networks (DNNs), which have millions of parameters. While previous studies have reported competitive performance for arousal and dominance, the prediction results for valence using acoustic features are significantly lower. We hypothesize that higher regularization can lead to better results for valence. This study focuses on exploring the role of dropout as a form of regularization for valence suggesting the need for higher regularization. We analyze the performance of regression models for valence, arousal and dominance as a function of the dropout probability. We observe that the optimum dropout rates are consistent for arousal and dominance. However, the optimum dropout rate for valence is higher. To understand the need for higher regularization for valence, we perform an empirical analysis to explore the nature of emotional cues conveyed in speech. We compare regression models with speaker-dependent and speaker-independent partitions for training and testing. The experimental evaluation suggests stronger speaker dependent traits for valence. We conclude that higher regularization is needed for valence to force the network to learn global patterns that generalize across speakers.

 DOI: 10.21437/Interspeech.2018-2508

Cite as: Sridhar, K., Parthasarathy, S., Busso, C. (2018) Role of Regularization in the Prediction of Valence from Speech. Proc. Interspeech 2018, 941-945, DOI: 10.21437/Interspeech.2018-2508.

  author={Kusha Sridhar and Srinivas Parthasarathy and Carlos Busso},
  title={Role of Regularization in the Prediction of Valence from Speech},
  booktitle={Proc. Interspeech 2018},