Speech Prosody 2012

Shanghai, China
May 22-25, 2012

Form versus Function – Prosodic Annotation and Modeling Go Hand in Hand

Hansjörg Mixdorff

Department of Computer Science and Media, Beuth University of Applied Science, Berlin, Germany

This paper argues that prosodic annotation and modeling should be combined for facilitating analyses of prosodic functions that invariably require perceptual judgments. It compares perceptual prosodic annotations of prominent syllables and phrase boundaries with labels yielded by the combination of linguistic information from a TTS-front end, model-based prosodic features, as well as a model of perceived syllabic prominence from an earlier study. As can be expected this annotation of prosodic landmarks yields better results on reading style speech than on spontaneous speech data. Of the perceptual annotations, on average 89% of perceptually prominent syllables were identified correctly, as well as a similar percentage of prosodic boundaries. Hence a basic annotation of prosodic features is yielded which can later on be enhanced by additional information for which perceptual judgments are indispensable.

Index Terms: Prosodic annotation, prosodic modeling, Fujisaki model, perceptual prominence

Full Paper

Bibliographic reference.  Mixdorff, Hansjörg (2012): "Form versus function – prosodic annotation and modeling go hand in hand", In SP-2012, 621-623.