Some Non-F0 Cues to Emotional Speech: An Experiment with Morphing

Donna Erickson (1), Takaaki Shochi (2), Caroline Menezes (3), Hideki Kawahara (4), Ken-Ichi Sakakibara (5)

(1) Showa Music University, Kawasaki City, Japan; (2) Gipsa-Lab, Grenoble, France; (3) Goa, India; (4) Wakayama University, Wakayama, Japan; (5) Health Sciences University of Hokkaido, Japan

This paper investigates some non-F0 cues to emotional speech. Two speech samples were collected from spontaneous speech: the word "leave" - one sample spoken with emotion (sad) and the other, as not-emotional. Using the morphing algorithm of STRAIGHT [1], we morphed a series of 12 utterances, starting from the non-emotional "leave" to the emotional "leave", keeping F0 at 300 Hz. Perception test results show that the morphed speech sounds could be identified as sad, with stimulus 12 being heard as most emotional. The results of a simple correlation, together with a PCA analysis of listeners’ perceptual behavior, suggest that formant frequencies, specifically, lowering F2, F3, and F4 are important cues for perception of emotional (sad) speech.


  1. Kawahara, H.; Matsui, H., 2003. Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation. Proc. IEEE ICASSP, 2003.

