Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Automatic Emotion Recognition of Speech Signal in Mandarin

Sheng Zhang (1), P. C. Ching (2), Fanrang Kong (1)

(1) University of Science & Technology of China, China; (2) Chinese University of Hong Kong, China

Traditionally, a simultaneous recognition process using the same feature set of a spoken utterance is used to classify the emotional state of the speaker in addition to its content. However, an analysis on the classification performance for every pair of emotions shows that different features have distinctive classification abilities for different emotions. Therefore, we propose an efficient emotion recognition process called cascade bisection (CB-process), which carries out emotion recognition by means of several bisecting steps and applies different feature sets for every step. This process is based on the featuresí different abilities of classifying emotions. Through this, we can fully utilize the information extracted from features and achieve a better recognition performance. Five discrete emotional states, namely, neutral, anger, fear, joy, and sadness are distinguished from the input Mandarin speech. After extracting the acoustic features that contain information on short-time energy (amplitude), signal amplitude, and pitch, we derive the representation feature set for further use in the CB-process, which achieves better emotion recognition as demonstrated seen from the experimental results.

Full Paper

Bibliographic reference.  Zhang, Sheng / Ching, P. C. / Kong, Fanrang (2006): "Automatic emotion recognition of speech signal in Mandarin", In INTERSPEECH-2006, paper 1128-Wed2BuP.6.