Third ESCA/COCOSDA Workshop on Speech Synthesis

November 26-29, 1998
Jenolan Caves House, Blue Mountains, NSW, Australia

Three Methods of Intonation Modeling

Ann K. Syrdal (1), Gregor Möhler (2), Kurt Dusterhoff (3), Alistair Conkie (1), Alan W. Black (3)

(1) AT&T Labs, Research, Florham Park, NJ, USA
(2) Institute of Natural Language Processing, University of Stuttgart, Germany
(3) Centre for Speech Technology Research, University of Edinburgh, Scotland, UK

This paper compares di erent methods of generating intonation for an American English Text-to-Speech synthesis system. We look at a primarily rule-based approach and two data-driven approaches.

For data-driven modeling we used two separate data sets, each representing a somewhat di erent prosodic style. One database was recordings of a portion of 1989 Wall Street Journal text from the Penn Treebank Project. The second database was recordings of interactive prompts used in telephone network services. Both were read by the same female speaker. Approximately two and one-half hours of speech was phonetically and prosodically segmented and labeled (first automatically, and subsequently verified manually). The prosodic labeling used ToBI [1] tones and breaks. Three di erent intonation models were compared: (1) a predominantly rule-based model based on ToBI labels [3]; (2) a parametric model using the Tilt approach [2]; and (3) a Vector Quantized model based on an underlying parametric representation [4]. Sentences representative of both prosodic styles were synthesized with each of these models, and were presented to listeners for subjective ratings in a formal listening test. The results of the evaluation are reported.

References

  1. K. Silverman, M. Beckman, J. Pitrelli, M. Osten- dorf, C. Wightman, P. Price, J. Pierrehumbert, and J. Hirschberg. ToBI: a standard for labeling english prosody. ICSLP, 2:867-870, 1992.
  2. P. Taylor and A. Black. Synthesizing conversational in- tonation from a linguistically rich input. In Proc. ESCA Workshop on Speech Synthesis, pages 175-178, Mohonk, NY, 1994.
  3. Matthias Jilka. Regelbasierte Generierung naturlich klin- gender Intonationsmuster des Amerikanischen Englisch (Rule-based generation of naturally sounding intonation patterns of American English). University of Stuttgart, Institute of Natural Language Processing, University of Stuttgart, 1996.
  4. Gregor Möhler and Alistair Conkie. Parametric modeling of intonation using vector quantization. In Third Inter- national Workshop on Speech Synthesis, Jenolan Caves, Australia, 1998.


Full Paper

Bibliographic reference.  Syrdal, Ann K. / Möhler, Gregor / Dusterhoff, Kurt / Conkie, Alistair / Black, Alan W. (1998): "Three Methods of Intonation Modeling", In SSW3-1998, 305-310.