INTERSPEECH 2013
14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Active Learning for Dimensional Speech Emotion Recognition

Wenjing Han (1), Haifeng Li (1), Huabin Ruan (2), Lin Ma (1), Jiayin Sun (1), Björn Schuller (1)

(1) Harbin Institute of Technology, China
(2) Tsinghua University, China

State-of-the-art dimensional speech emotion recognition systems are trained using continuously labelled instances. The data labelling process is labour intensive and time-consuming. In this paper, we propose to apply active learning to reduce according efforts: The unlabelled instances are evaluated automatically, and only the most informative ones are intelligently picked by an informativeness measure function for a human to label. Specifically, we estimate the informativeness of each unlabelled instance based on a binary-classification confidence score for an emotion being predicted to be negative or positive on a given emotional dimension. For verification, we consider a pool-based and a stream-based scenario run on part of the continuous AVEC 2012 task to demonstrate the feasibility of the proposed approach in practice. In the result, our approach requires significantly less human labelled data instances to reach a given performance than passive learning does in both scenarios.

Full Paper

Bibliographic reference.  Han, Wenjing / Li, Haifeng / Ruan, Huabin / Ma, Lin / Sun, Jiayin / Schuller, Björn (2013): "Active learning for dimensional speech emotion recognition", In INTERSPEECH-2013, 2841-2845.