Speech Prosody 2006

Dresden, Germany
May 2-5, 2006

Duration Prediction in Mandarin TTS System

Qing Guo (1), Nobuyuki Katae (2)

(1) Fujitsu Research and Develop Center China, Beijing, China
(2) Fujitsu Laboratories Ltd., Japan

This paper reports the methodology and results of decision tree based duration prediction for a Mandarin text-to-speech system developed by the Fujitsu Laboratories. Syllable initials and finals are the basic units in this duration study. Factors influencing finals duration such as phrase boundary and phone context are discussed in detail. Experiments indicate that it is the most important determinant of finals duration whether the prosodic factor of the right phrase boundary level is below the prosodic word level or not. Furthermore, the degree of phrase boundary vowel lengthening may vary depending on the types of finals. This paper also explains methods for objective evaluation of duration prediction model. Lastly, prosody evaluation results convincing that the prosody generated by our prosody generation module is much better than that of two other popular Mandarin TTS systems.

Full Paper

Bibliographic reference.  Guo, Qing / Katae, Nobuyuki (2006): "Duration prediction in Mandarin TTS system", In SP-2006, paper 032.