Speech Prosody 2002
Text-to-Speech synthesis offers an interesting manner of synthesising various knowledge components related to speech production. To a certain extent, it provides a new way of testing the coherence of our understanding of speech production in a highly systematic manner. For example, speech rhythm and temporal organisation of speech have to be well-captured in order to mimic a speaker correctly.
The simulation approach used in our laboratory for two languages supports our original hypothesis of multidimensionality and non-linearity in the production of speech rhythm. This paper presents an overview of our approach towards this issue, as it has been developed over the last years.
We conceive the production of speech rhythm as a
multidimensional task, and the temporal organisation of
speech as a key component of this task (i.e., the establishment
of temporal boundaries and durations). As a result of this
multidimensionality, text-to-speech systems have to
accommodate a number of systematic transformations and
computations at various levels. Our model of the temporal
organisation of read speech in French and German emerges
from a combination of quantitative and qualitative parameters,
organised according to psycholinguistic and linguistic
structures. (An ideal speech synthesiser would also take into
account subphonemic as well as pragmatic parameters.
However such systems are not yet available).
Online Synthesis: http://www.unil.ch/servlets/imm/SyntheseServlet
Sound examples: http://www.unil.ch/imm/docs/LAIP/LAIPTTS.html
Bibliographic reference. Zellner Keller, Brigitte (2002): "Revisiting the status of speech rhythm", In SP-2002, 727-730.