Speech Prosody 2008

Campinas, Brazil
May 6-9, 2008

Learning Prosodic Sequences Using the Fundamental Frequency Variation Spectrum

Kornel Laskowski (1), Jens Edlund (2), Mattias Heldner (2)

(1) interACT, Carnegie Mellon University, Pittsburgh PA, USA
(2) Centre for Speech Technology, KTH, Stockholm, Sweden

We investigate a recently introduced vector-valued representation of fundamental frequency variation, whose properties appear to be well-suited for statistical sequence modeling. We show what the representation looks like, and apply hidden Markov models to learn prosodic sequences characteristic of higher-level turn-taking phenomena. Our analysis shows that the models learn exactly those characteristics which have been reported for the phenomena in the literature. Further refinements to the representation lead to a 12-17% relative improvement in speaker change prediction for conversational spoken dialogue systems.

Full Paper

Bibliographic reference.  Laskowski, Kornel / Edlund, Jens / Heldner, Mattias (2008): "Learning prosodic sequences using the fundamental frequency variation spectrum", In SP-2008, 151-154.