The goal of speech emotion recognition (SER) is to identify the emotional or physical state of a human being from his or her voice. One of the most important things in a SER task is to extract and select relevant speech features with which most emotions could be recognized. In this paper, we present a smoothed nonlinear energy operator (SNEO)-based amplitude modulation cepstral coefficients (AMCC) feature for recognizing emotions from speech signals. SNEO estimates the energy required to produce the AM-FM signal, and then the estimated energy is separated into its amplitude and frequency components using an energy separation algorithm (ESA). AMCC features are obtained by first decomposing a speech signal using a C-channel gammatone filterbank, computing the AM power spectrum, and taking a discrete cosine transform (DCT) of the root compressed AM power spectrum. Conventional MFCC (Mel-frequency cepstral coefficients) and Mel-warped DFT (discrete Fourier transform) spectrum based cepstral coefficients (MWDCC) features are used for comparing the recognition performances of the proposed features. Emotion recognition experiments are conducted on the FAU AIBO spontaneous emotion corpus. It is observed from the experimental results that the AMCC features provide a relative improvement of approximately 3.5% over the baseline MFCC.
Bibliographic reference. Alam, Md. Jahangir / Attabi, Yazid / Dumouchel, Pierre / Kenny, Patrick / O'Shaughnessy, Douglas (2013): "Amplitude modulation features for emotion recognition from speech", In INTERSPEECH-2013, 2420-2424.