Speech Prosody 2006

Dresden, Germany
May 2-5, 2006

Emotion Recognition in the Noise Applying Large Acoustic Feature Sets

Björn Schuller, Dejan Arsic, Frank Wallhoff, Gerhard Rigoll

Institute for Human-Machine Communication, Technische Universität München, Germany

Speech emotion recognition is considered mostly under ideal acoustic conditions: acted and elicited samples in studio quality are used besides sparse works on spontaneous fielddata. However, specific analysis of noise influence plays an important factor in speech processing and is practically not considered hereon, yet. We therefore discuss affect estimation under noise conditions herein. On 3 well-known public databases - DES, EMO-DB, and SUSAS - effects of postrecording noise addition in diverse dB levels, and performance under noise conditions during signal capturing, are shown. To cope with this new challenge we extend generation of functionals by extraction of a large 4k hi-level feature set out of more than 60 partially novel base contours. Such comprise among others intonation, intensity, formants, HNR, MFCC, and VOC19. Fast Information-Gain-Ratio filter-selection picks attributes according to noise conditions. Results are presented using Support Vector Machines as classifier.

Full Paper

Bibliographic reference.  Schuller, Björn / Arsic, Dejan / Wallhoff, Frank / Rigoll, Gerhard (2006): "Emotion recognition in the noise applying large acoustic feature sets", In SP-2006, paper 128.