Speech Prosody 2004

Nara, Japan
March 23-26, 2004

Analysis of Segmental Duration for Thai Speech Synthesis

Chatchawarn Hansakunbuntheung (1), Yoshinori Sagisaka (2)

(1) Information R&D Division, National Electronics and Computer Technology Center, Thailand
(2) Global Information and Telecommunication Institute, Waseda University, Tokyo, Japan

This paper presents a characteristic study of Thai segmental duration and adapts the analysis results to construct a Thai phone duration model for Thai speech synthesis. The study uses Hayashi's categorized linear regression model to analyze the effects of various factors including current phonemes themselves, surrounding phonemes, phone positions in word, phone positions in phrase, part-of-speeches and Thai tones. These factors have combined to form a Thai phone duration model. The model gives rather high correlation of 0.788. Thought, it has fairly high RMS error of 33.14 ms, a evaluation shows the high consistency of the model on unknown data.

Full Paper

Bibliographic reference.  Hansakunbuntheung, Chatchawarn / Sagisaka, Yoshinori (2004): "Analysis of segmental duration for Thai speech synthesis", In SP-2004, 479-482.