State-of-the-art dimensional speech emotion recognition systems are trained using continuously labelled instances. The data labelling process is labour intensive and time-consuming. In this paper, we propose to apply active learning to reduce according efforts: The unlabelled instances are evaluated automatically, and only the most informative ones are intelligently picked by an informativeness measure function for a human to label. Specifically, we estimate the informativeness of each unlabelled instance based on a binary-classification confidence score for an emotion being predicted to be negative or positive on a given emotional dimension. For verification, we consider a pool-based and a stream-based scenario run on part of the continuous AVEC 2012 task to demonstrate the feasibility of the proposed approach in practice. In the result, our approach requires significantly less human labelled data instances to reach a given performance than passive learning does in both scenarios.
Bibliographic reference. Han, Wenjing / Li, Haifeng / Ruan, Huabin / Ma, Lin / Sun, Jiayin / Schuller, Björn (2013): "Active learning for dimensional speech emotion recognition", In INTERSPEECH-2013, 2841-2845.