INTERSPEECH 2006 - ICSLP
Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Timing Levels in Segment-Based Speech Emotion Recognition

Björn Schuller, Gerhard Rigoll

Technische Universität München, Germany

Additional sub-phrase level information is believed to improve accuracy in speech emotion recognition systems. Yet, automatic segmentation is a challenge on its own considering word- or syllable boundaries. Further more clarification is needed which timing level leads to optimal results. In this paper we therefore quantitatively discuss three approaches to segment-level features based on 276 statistical hi-level prosodic, articulatory and speech quality features. Apart from the choice of the optimal segmentation scheme also fusion of segments with respect to classification and combination of diverse timing levels is analyzed. Tests are carried out on the popular Berlin Database of Emotional Speech (EMO-DB). Significant improvement over existing works can be reported for combination of phrase-level features with relative time interval features.

Full Paper

Bibliographic reference.  Schuller, Björn / Rigoll, Gerhard (2006): "Timing levels in segment-based speech emotion recognition", In INTERSPEECH-2006, paper 1695-Wed2BuP.8.