Speech Prosody 2008
We investigate a recently introduced vector-valued representation of fundamental frequency variation, whose properties appear to be well-suited for statistical sequence modeling. We show what the representation looks like, and apply hidden Markov models to learn prosodic sequences characteristic of higher-level turn-taking phenomena. Our analysis shows that the models learn exactly those characteristics which have been reported for the phenomena in the literature. Further refinements to the representation lead to a 12-17% relative improvement in speaker change prediction for conversational spoken dialogue systems.
Bibliographic reference. Laskowski, Kornel / Edlund, Jens / Heldner, Mattias (2008): "Learning prosodic sequences using the fundamental frequency variation spectrum", In SP-2008, 151-154.