INTERSPEECH 2013
14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Affective Classification of Generic Audio Clips Using Regression Models

Nikolaos Malandrakis (1), Shiva Sundaram (2), Alexandros Potamianos (3)

(1) University of Southern California, USA
(2) Audyssey Laboratories, USA
(3) Technical University of Crete, Greece

We investigate acoustic modeling, feature extraction and feature selection for the problem of affective content recognition of generic, non-speech, non-music sounds. We annotate and analyze a database of generic sounds containing a subset of the BBC sound effects library. We use regression models, long-term features and wrapper-based feature selection to model affect in the continuous 3-D (arousal, valence, dominance) emotional space. The frame-level features for modeling are extracted from each audio clip and combined with functionals to estimate long term temporal patterns over the duration of the clip. Experimental results show that the regression models provide similar categorical performance as the more popular Gaussian Mixture Models. They are also capable of predicting accurate affective ratings on continuous scales, achieving 62.67% 3-class accuracy and 0.69.0.75 correlation with human ratings, higher than comparable numbers in literature.

Full Paper

Bibliographic reference.  Malandrakis, Nikolaos / Sundaram, Shiva / Potamianos, Alexandros (2013): "Affective classification of generic audio clips using regression models", In INTERSPEECH-2013, 2832-2836.