Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
One of the major problems in the phonetic realisation of synthetic intonation is the modelling of local segmental effects on the course of F0, despite the fact that in a system using concatenation synthesis these effects are already present in the units recorded from natural speech. The usual procedure in concatenation synthesis is to remove all F0 information from the units and then reimpose a synthetic F0 contour: however, it is possible to take advantage of the original F0 information to model local segmental effects and thereby produce a more natural F0 contour. This paper proposes a method for extracting microprosodic information from the F0 contours of diphones recorded from natural speech, and presents some preliminary results of its application in the TTS system developed at Edinburgh University's Centre for Speech Technology Research (CSTR) . Problems with the work reported here are also discussed, as are directions for future research both on microprosody and on evaluation experiments.
Bibliographic reference. Monaghan, Alex I. C. (1992): "Extracting microprosodic information from diphones - a simple way to model segmental effects on prosody for synthetic speech", In ICSLP-1992, 1159-1162.