12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Vowels Formants Analysis Allows Straightforward Detection of High Arousal Acted and Spontaneous Emotions

Bogdan Vlasenko, Dmytro Prylipko, David Philippou-Hübner, Andreas Wendemuth

Otto-von-Guericke-Universität Magdeburg, Germany

The role of automatic emotion recognition from speech grows continually because of accepted importance of reacting to the emotional state of the user in human-computer interaction. Most part of state-of-the-art emotion recognition methods are based on context independent turn- and frame-level analysis. In our earlier ICME 2011 article it has been shown that robust high arousal acted emotions detection can be performed on the context dependent vowel basis. In contrast to using a HMM/GMM classification with 39-dimensional MFCC vectors, a much more convenient Neyman- Pearson criterion with the only one average F1 value is employed here. In this paper we apply the proposed method to the spontaneous emotion recognition from speech. Also, we avoid the use of speaker-dependent acoustic features in favor of gender-specific ones. Finally we compare performances of acted and spontaneous emotions for different criterion threshold values.

Full Paper

Bibliographic reference.  Vlasenko, Bogdan / Prylipko, Dmytro / Philippou-Hübner, David / Wendemuth, Andreas (2011): "Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions", In INTERSPEECH-2011, 1577-1580.