13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

A Robust Unsupervised Arousal Rating Framework using Prosody with Cross-Corpora Evaluation

Daniel Bone, Chi-Chun Lee, Shrikanth S. Narayanan

Signal Analysis and Interpretation Laboratory (SAIL), Los Angeles, CA, USA

This paper presents an unsupervised method for producing a bounded rating of affective arousal from speech. One of the major challenges in such behavioral signal classification is the design of methods that generalize well across domains and datasets. We propose a framework that provides robustness across databases by: selecting coherent features based on empirical and theoretical evidence, fusing activation confidences from multiple features, and effectively weighting the soft-labels without knowing the true labels. Spearman's rank-correlation (and binary classification accuracy) on four arousal databases are: 0.62 (73%), 0.77 (86%), 0.70 (82%), and 0.65 (73%).

Index Terms: arousal rating, activation, unsupervised, knowledge-based, inter-rater reliability, cross-corpora

Full Paper

Bibliographic reference.  Bone, Daniel / Lee, Chi-Chun / Narayanan, Shrikanth S. (2012): "A robust unsupervised arousal rating framework using prosody with cross-corpora evaluation", In INTERSPEECH-2012, 1175-1178.