Speech Prosody 2002

Aix-en-Provence, France
April 11-13, 2002

Speech Technology, ToBI, and Making Sense of Prosody

Hansjörg Mixdorff

Faculty of Computer Science, Berlin University of Applied Sciences, Germany

The current paper critically examines why prosodic knowledge has not yet found its way into commercial applications of speech technology. As a key issue of potential improvements to speech recognition and synthesis we identify the capability of understanding and expressing meaning by means of prosodic features of speech. We suggest that even a complete and ‘correct’ ToBI transcription will always remain some kind of intermediate and possibly incomplete stage of representation between the intended meaning of a message and the resulting speech signal. Examining the correspondences between a version of G-ToBI and the quantitative syllable-based integrated model developed by the author which uses the Fujisaki model for parametrizing F0 contours we conclude that ToBI accent labels can be derived from Fujisaki parameters. Finally we show that perceived prominence which can be thought of as the result of perceptual integration of various prosodic cues with respect to the information structure of an utterance can be reliably predicted from accent command amplitudes and normalized syllabic durations.

Full Paper

Bibliographic reference.  Mixdorff, Hansjörg (2002): "Speech technology, toBI, and making sense of prosody", In SP-2002, 31-37.