Speech Prosody 2010

Chicago, IL, USA
May 10-14, 2010

A Model for Varying Speaking Style in TTS Systems

Sophie Roekhaut (1,2), Jean-Philippe Goldman (3,4), Anne Catherine Simon (4)

(1) TCTS Lab, Université de Mons, Belgium; (2) CENTAL, Université catholique de Louvain, Belgium
(3) Département de Linguistique, Université de Genève, Switzerland
(4) Institut Langage et Communication/Centre Valibel – Discours et Variation, Université catholique de Louvain, Belgium

This paper aims to enhance the performance of a TTS system by generating various speaking styles. First we describe three speaking styles (Radio News, Political Address and Conversation) and compare the prosodic features found in these authentic styles with the prosody in “neutral” speech uttered by the eLite TTS system [1]. Differences concern about 20 prosodic characteristics (F0 span, speech rate, pauses and hesitation, primary and secondary accentuation, schwa deletion, etc.). In order to make the neutral speech similar to a typical speaking style, prosodic characteristics are implemented within the TTS system itself or during a postprocessing step. The quality of the “stylized” synthesis is evaluated by comparing it to the original style.

Index Terms: speaking styles, speech synthesis, French prosody, accentuation, pauses, hesitations

References

  1. Beaufort, R. and Ruelle, A. “eLite: système de synthèse de parole à orientation linguistique”. Proc. of JEP, 509-512, 2006.

Full Paper

Bibliographic reference.  Roekhaut, Sophie / Goldman, Jean-Philippe / Simon, Anne Catherine (2010): "A model for varying speaking style in TTS systems", In SP-2010, paper 096.