Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

Syllable-Based Thai Duration Model using Multi-Level Linear Regression and Syllable Accommodation

Chatchawarn Hansakunbuntheung (1), Hiroaki Kato (2), Yoshinori Sagisaka (1)

(1) GITI/Language and Speech Science Research Laboratory, Waseda University, Tokyo, Japan
(2) NICT/ATR Cognitive Information Science Labs, Kyoto, Japan

This paper proposes a syllable-based Thai duration model using multi-level linear regression and syllable accommodation. To build a timing model reflecting control characteristics directly, we introduce two analysis results on hierarchical control characteristics. First analysis result showed that syllable is highly correlated to higher-phone-level timing controls, while phone differences by themselves do not affect higher control and contribute to local timing control only. Second one on the syllable accomodation showed that phone duration highly depends on local phone factors. These analysis results support a syllable-based hierarchical model proposed in this paper. Duration prediction experiments of 5-fold cross validation showed 46.73 and 32.37 ms in RMS error, and, 0.905 and 0.811 in correlation between measured and predicted duration at syllable and phone levels, respectively. The comparison of predicted precision showed that the proposed syllable-based multi-level duration model better performed than a conventional single-level phone duration model.

