EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Building An Integrated Prosodic Model of German

Hansjörg Mixdorff (1), Oliver Jokisch (2)

(1) Berlin University of Applied Sciences, Germany
(2) Dresden University of Technology, Germany

The intellegibility and naturalness of synthetic speech strongly depends on its prosodic quality. Departing from works by Mixdorff on a linguistically motivated model of German intonation based on the Fujisaki model, the current paper presents statistical results concerning the relationship between linguistic and phonetic information underlying an utterance and its prosodic features. Statistical analysis yields, inter alia, the following pairs of strongest single factor - prosodic feature: boundary depths (right) - syllable duration; boundary depths (left) -phrase command magnitude Ap; accent type (intoneme) - accent command amplitude Aa. These results were employed for training an FFNN-based integrated prosodic model predicting syllable durations along with syllable-aligned Fujisaki control parameters. Correlations between trained and predicted parameters suggest synergy effects, as they are mostly higher than correlations yielded when predicting parameters individually from the same set of input features using a regression model. Informal listening tests with resynthesis examples showed encouraging results.

Full Paper

Bibliographic reference.  Mixdorff, Hansjörg / Jokisch, Oliver (2001): "Building an integrated prosodic model of German", In EUROSPEECH-2001, 947-950.