Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

Maximum-Likelihood Dynamic Intonation Model for Concatenative Text-to-Speech System

Slava Shechtman

IBM Research Laboratory, Haifa, Israel

In this work we present a Maximum Likelihood (ML) joint pitch curve modeling, inspired by HMM TTS synthesis concept. This model provides an optimal solution for the coarse target intonation curve (3 points per syllable) and incorporates both static and dynamic pitch values for better utterance intonation modeling. The coarse intonation curve may be optionally combined with the original pitch extracted from the concatenated units, by a technique named microprosody preservation, which is also described. The latter is intended for reducing pitch modification ratio and improving sound naturalness for large-scale concatenative TTS systems. The proposed model was successfully applied on IBM’s trainable concatenative TTS system improving the subjective intonation quality.

Full Paper   Presentation (ppt)

Bibliographic reference.  Shechtman, Slava (2007): "Maximum-likelihood dynamic intonation model for concatenative text-to-speech system", In SSW6-2007, 234-239.