International Symposium on Tonal Aspects of Languages
With Emphasis on Tone Languages

Beijing, China
March 28-31, 2004

Prosodic Word Boundaries Prediction for Mandarin Text-to-Speech

Yan-Qiu Shao, Ji-Qing Han, Ting Liu, Yong-Zhen Zhao

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

In Mandarin speech, the Prosodic Word (PW) is the basic rhythmic unit instead of Lexical Word (LW), and the naturalness of TTS will be directly influenced by the segmentation of PW. Most of the PWs are the combination of some LWs. In this paper, three models, i.e. a directed acyclic graph (DAG) model, segmentation model and Markov Model (MM) combined with Transformation-Based Error Driven (TBED) learning algorithm are designed to combine lexical words into prosodic words. Considering some long LWs should be broken into two or more PWs, a long word break model is also applied to those LWs. Experimental results show that MM combined with TBED plus a long word break model is the best one among the three methods, and 93.00% precision and 93.23% recall are achieved.

Full Paper

Bibliographic reference.  Shao, Yan-Qiu / Han, Ji-Qing / Liu, Ting / Zhao, Yong-Zhen (2004): "Prosodic word boundaries prediction for Mandarin text-to-speech", In TAL-2004, 159-162.