Speech Prosody 2002
This paper builds predictive models of segment duration in context based on the CART models and "additive-multiplicative" models for Korean text-to-speech. It uses a corpus of 670 read sentences collected from one speaker of standard Korean. The best performance was obtained from a CART decision tree model, which shows that the correlation between the observed and the predicted durations is 0.77 and the mean squared error of prediction is 25.11 ms. Linguistic implications of these models are also discussed. The perceptual evaluations of these models are carried out using a Korean language diphone database based on the MBROLA synthesis system in order to investigate the clarity and the listener preference for durations.
Bibliographic reference. Chung, Hyunsong (2002): "Duration models and the perceptual evaluation of spoken Korean", In SP-2002, 219-222.