First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

Statistical Analysis for Segmental Duration Rules in Japanese Speech Synthesis

Nobuyoshi Kaiki (1), Kazuya Takeda (2), Yoshinori Sagisaka (1)

(1) ATR Interpreting Telephony Research Laboratories, Kyoto, Japan
(2) KDD Kamifukuoka R & D Laboratories, Saitama, Japan

In this paper, duration control factors are statistically analyzed using Japanese speech data uttered by four speakers. According to previous studies, the important factors are phoneme category, neighboring phonemes, position in breath group and mora count of breath group. In addition to the above factors, we introduce several new control factors. They are position in phrase, mora count of phrase, content / function word category, pre- and post-adjacent phonemes, and temporal compensation caused by geminated consonants. Using these statistically significant factors, a vowel duration model is proposed for Japanese speech synthesis. The duration prediction experiments using this model showed that the root mean square errors between predicted duration and observed duration were 15.30ms (19.6% of the average length) for vowels in the training set, and 15.84ms (19.9% of the average length) for vowels in the testing set.

