Fourth ISCA ITRW on Speech Synthesis
August 29 - September 1, 2001
We use Stem-ML to build an automatic learning system for Mandarin prosody that allows us to make quantitative measurements of prosodic strengths. Stem-ML is a phenomenological model of the muscle dynamics and planning process that controls the tension of the vocal folds. Becnse Stem-ML describes the interactions between nearby tones or accents, we were able to use a highly constrained model with only one accent template for each lexical tone category, and a single prosodic strength per word. The model accurately reproduces the intonation of the speaker, capturing 87% of the variance of F0. The result reveals strong alternating metrical patterns in words, and shows that the speaker uses word strength to mark a hierarchy of boundaries.
Bibliographic reference. Kochanski, Greg P. / Shih, Chilin / Jing, Hongyan (2001): "Hierarchical structure and word strength predication of Mandarin prosody", In SSW4-2001, paper 130.