The role of automatic emotion recognition from speech grows continually because of accepted importance of reacting to the emotional state of the user in human-computer interaction. Most part of state-of-the-art emotion recognition methods are based on context independent turn- and frame-level analysis. In our earlier ICME 2011 article it has been shown that robust high arousal acted emotions detection can be performed on the context dependent vowel basis. In contrast to using a HMM/GMM classification with 39-dimensional MFCC vectors, a much more convenient Neyman- Pearson criterion with the only one average F1 value is employed here. In this paper we apply the proposed method to the spontaneous emotion recognition from speech. Also, we avoid the use of speaker-dependent acoustic features in favor of gender-specific ones. Finally we compare performances of acted and spontaneous emotions for different criterion threshold values.
Bibliographic reference. Vlasenko, Bogdan / Prylipko, Dmytro / Philippou-Hübner, David / Wendemuth, Andreas (2011): "Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions", In INTERSPEECH-2011, 1577-1580.