12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Using Multiple Databases for Training in Emotion Recognition: To Unite or to Vote?

Björn Schuller, Zixing Zhang, Felix Weninger, Gerhard Rigoll

Technische Universität München, Germany

We present an extensive study on the performance of data agglomeration and decision-level fusion for robust cross-corpus emotion recognition. We compare joint training with multiple databases and late fusion of classifiers trained on single databases, employing six frequently used corpora of natural or elicited emotion, namely ABC, AVIC, DES, eNTERFACE, SAL, VAM, and three classifiers i. e. SVM, Random Forests, Naive Bayes to best cover for singular effects. On average over classifier and database, data agglomeration and majority voting deliver relative improvements of unweighted accuracy by 9.0% and 4.8%, respectively, over single-database cross-corpus classification of arousal, while majority voting performs best for valence recognition.

Full Paper

Bibliographic reference.  Schuller, Björn / Zhang, Zixing / Weninger, Felix / Rigoll, Gerhard (2011): "Using multiple databases for training in emotion recognition: to unite or to vote?", In INTERSPEECH-2011, 1553-1556.