Third ESCA/COCOSDA Workshop on Speech Synthesis
November 26-29, 1998
A corpus-based concatenative speech synthesis system using no signal processing can produce intelligible synthetic speech maintaining original voice characteristics. In such a concatenative system, it is very important to select appropriate waveform segments that are naturally close to the target prosody. But with a limited size database it can sometimes be difficult to realize natural prosody.
This paper describes an approach to unit (waveform segment) selection for improving the intonation. We analyzed the pitch patterns of 503 sentences of read speech spoken by a Japanese female and obtained the F0 range of natural prosody. Then we applied this restriction to the unit selection of the concatenative speech synthesizer. Through subjective experiments, we confirmed that this measure significantly improved the intonational naturalness of synthetic speech.
Bibliographic reference. Fujisawa, Ken / Campbell, Nick (1998): "Prosody-Based Unit Selection For Japanese Speech Synthesis", In SSW3-1998, 181-184.