Speech Prosody 2006
A speaker's mental state is often conveyed by acoustic and prosodic factors, as well as the words they choose and the gestures they use. Considerable research has been done in recent years to detect emotional state in IVR systems, so that angry or frustrated users can be directed to a human agent. Other research has sought to identify a wider variety of emotions and intentions in recorded meetings, again from acoustic and prosodic cues. From the perspective of speech generation, the problem of conveying emotional state has emerged as a critical topic in the continuing effort to make TTS systems sound more like real human beings. Computer game designers as well as IVR system developers all cite the limits of prosodic and emotional 'naturalness' as a barrier to using current systems.
In this talk I will describe ongoing research in the speech group at Columbia, designed to expand the variety of speaker states which may be identified and produced by acoustic and prosodic variation. I will describe recent work in the detection of confidence and uncertainty in a physics tutoring system (joint work with the University of Pittsburgh), work to identify the acoustic and prosodic characteristics of 'charismatic' speech across cultures, and research into the acoustic and prosodic indicators of deceptive speech (joint work with the University of Colorado and SRI International). I will also describe recent progress in the automatic detection of prosodic features which should make both recognition and generation of the prosodic characteristics of speaker state more accurate.
Bibliographic reference. Hirschberg, Julia (2006): "Recognizing and conveying speaker state prosodically", In SP-2006, paper KN1.