September 22-25, 1997
This paper presents the realisation of an automatically trainable computational prosodic model for French Text-to-Speech Synthesis. The methodology proposes the construction of the model in two steps. The first step consists in predicting fundamental frequency contours and duration of syllables from prosodic markers using neural networks [17,12]. In this step, the prosodic markers are automatically extracted from the signal by analysing prosodic realisations  and identifying a prosodic alphabet and a set of labelling rules. The second step integrates the model into the CNET Text-to-Speech Synthesis system  by using its linguistic levels and predicting prosodic markers from text and linguistic labels. The system is evaluated by nadve listeners and compared with the actual CNET Text-to-Speech Synthesis system.
Bibliographic reference. Tournemire, Stéphanie de (1997): "Identification and automatic generation of prosodic contours for a text-to-speech synthesis system in French", In EUROSPEECH-1997, 191-194.