Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

Two-Step Generation of Mandarin F0 Contours Based on Tone Nucleus and Superpositional Models

Qinghua Sun (1), Keikichi Hirose (2), Nobuaki Minematsu (3)

(1) Graduate School of Engineering; (2) Graduate School of Information Science and Technology; (3) Graduate School of Frontier Sciences; University of Tokyo, Japan

A 2-step scheme was developed in our method for synthesizing sentence fundamental frequency (F0) contours of Mandarin speech. The method is based on representing a sentence logarithmic F0 contour as a superposition of tone components on phrase components as in the case of generation process model (F0 model). The tone components are realized by concatenating tone nucleus F0 patterns generated by a corpus-based method, while the phrase components are generated by rules under the F0 model framework. In the 2-step scheme, the phrase components are first generated and their information is added to the inputs for the prediction of tone nucleus F0 patterns. Result of listening tests on synthetic speech with the synthesized F0 contours verified the validity of the developed scheme. For comparison, we also generated F0 contours without decomposing them into tone and phrase components as most existing methods did. Although from the viewpoint of naturalness of synthetic speech, the result did not show clear advantage of the proposed method, from the viewpoint of flexibility the advantage came clear: by manipulating phrase components in the proposed method, a better focus control was realized.

Full Paper   Presentation (pdf)

