13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Perceptual Foundations for Naturalistic Variability in the Prosody of Synthetic Speech

Nanette Veilleux (1), Jonathan Barnes (2), Alejna Brugos (2), Stefanie Shattuck-Hufnagel (3)

(1) Computer Science and Informatics, Simmons College, Boston, MA, USA
(2) Romance Studies & Applied Linguistics, Boston University, Boston, MA, USA
(3) Research Lab of Electronics, Massachusetts Institute of Technology, Cambridge, MA, USA

Recent studies have shown that the Tonal Center of Gravity is a better classifier than F0 Turning Points for at least two contrastively timed pitch accents in American English intonation contours. Within this framework, a binary F0 weighting function derived from the F0 contour can be used instead of the natural F0 contour without a degradation in discrimination performance. This success has important implications for speech synthesis. Just as we can capture the functional equivalence of a multitude of auditorily distinct F0 contour shapes in terms of their mapping to a single parameter (the TCoG) via a set of binary weighting functions, this same mapping could be run in reverse as a source to generate natural-sounding variability in speech synthesis.

Index Terms: Tonal Center of Gravity, F0 alignment, pitch accent classification, prosody, speech synthesis

Full Paper

Bibliographic reference.  Veilleux, Nanette / Barnes, Jonathan / Brugos, Alejna / Shattuck-Hufnagel, Stefanie (2012): "Perceptual foundations for naturalistic variability in the prosody of synthetic speech", In INTERSPEECH-2012, 2534-2537.