Third ESCA/COCOSDA Workshop on Speech Synthesis

November 26-29, 1998
Jenolan Caves House, Blue Mountains, NSW, Australia

Prosody-Based Unit Selection For Japanese Speech Synthesis

Ken Fujisawa, Nick Campbell

ATR Interpreting Telecommunications Research Labs. Seika-cho, Soraku-gun, Kyoto, Japan

A corpus-based concatenative speech synthesis system using no signal processing can produce intelligible synthetic speech maintaining original voice characteristics. In such a concatenative system, it is very important to select appropriate waveform segments that are naturally close to the target prosody. But with a limited size database it can sometimes be difficult to realize natural prosody.

This paper describes an approach to unit (waveform segment) selection for improving the intonation. We analyzed the pitch patterns of 503 sentences of read speech spoken by a Japanese female and obtained the F0 range of natural prosody. Then we applied this restriction to the unit selection of the concatenative speech synthesizer. Through subjective experiments, we confirmed that this measure significantly improved the intonational naturalness of synthetic speech.

Full Paper

Bibliographic reference.  Fujisawa, Ken / Campbell, Nick (1998): "Prosody-Based Unit Selection For Japanese Speech Synthesis", In SSW3-1998, 181-184.