Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
We will report on our current work to model extralinguistic features in speech. As a starting point we analysed sentences representing different emotions produced by actors. In another project, this material had been evaluated by a listener panel. Sentences with successfully acted emotions were selected for further analysis. The analysis corroborated earlier findings concerning speech tempo and fundamental frequency contours. We also found differences in segmental phonetic realizations, partially correlated with speaker efforts, such as energetic angry arid restrained, sad speaking styles. The parametric representation of the analysis was simplified to conform with our rule based text-to-speech system. Several manipulated synthetic versions were compared and evaluated for perceived emotions. The experiment showed that emotions are signalled through a complex interaction of segmental and prosodic cues. In addition to the normally used parameters like fundamental frequency dynamics and level, speaking rate and segmental realization we also observed supporting non-phonetic sounds that enhanced the perceived emotional quality. We will in our presentation give example of the synthetic stimuli and discuss potential problems in the modelling of emotional speech in the present framework.
Bibliographic reference. Carlson, Rolf / Granström, Björn / Nord, Lennart (1992): "Experiments with emotive speech - acted utterances and synthesized replicas", In ICSLP-1992, 671-674.