Speech Prosody 2012

Shanghai, China
May 22-25, 2012

Tone Generation by Maximizing Joint Likelihood of Syllabic HMMs for Mandarin Speech Synthesis

Xingyu Na (1), Chaomin Wang (1), Xiang Xie (1), Jingming Kuang (1), Yaling He (2)

(1) School of Information and Electronics, Beijing Institute of Technology, China
(2) Eastel Corporation, Beijing, China

A tone generation method by maximizing the joint likelihood of syllabic HMMs is proposed to improve the Mandarin speech synthesis. F0 sequence is generated by jointly maximizing the likelihood of the state-level F0 model and syllable-level tone model under the constraint of mean F0 of the adjacent units. The optimal weight of the tone component is searched in terms of the parameter generation error and correlation coefficients. Objective and subjective evaluations both prove the positive effects of this method. The generation error is reduced by 26.7%, the correlation coefficient is increased by 6.5%, and the prosody perception is significantly improved.

Index Terms: speech synthesis, F0 contour, tone generation, speech prosody

