September 22-25, 1997
In general, most of the developed prosody and intonation models were obtained from a statistical analysis of F0 curves and resynthesis by TTS. But there is yet another chance improving quality and naturalness: effective results can also be obtained by analysing the listeners' common sense about natural intonational behavior. Therefore, we use a digital process that generates signals representing only the melody of the original speech signal. Comprehensive listening experiments become possible to analyse and compare the perception of natural and synthetic intonation. Based on the results of some listening experiments a statistical analysis of the F0 curves was carried out, regarding that a speaker-individual intonation model needs more quantitative F0 information than traditional descriptions. The aim is an prosodical speaker-dependent model for synthetic speech and dialog systems. Furthermore, this flexible approach should not be limited to speaker-individual intonation.
Full Paper Acoustic Example #1 Acoustic Example #2
Bibliographic reference. Mersdorf, Joachim J. / Domhover, Thomas (1997): "A perceptual study for modelling speaker-dependent intonation in TTS and dialog systems", In EUROSPEECH-1997, 867-870.