EUROSPEECH 2001 Scandinavia
The intellegibility and naturalness of synthetic speech strongly depends on its prosodic quality. Departing from works by Mixdorff on a linguistically motivated model of German intonation based on the Fujisaki model, the current paper presents statistical results concerning the relationship between linguistic and phonetic information underlying an utterance and its prosodic features. Statistical analysis yields, inter alia, the following pairs of strongest single factor - prosodic feature: boundary depths (right) - syllable duration; boundary depths (left) -phrase command magnitude Ap; accent type (intoneme) - accent command amplitude Aa. These results were employed for training an FFNN-based integrated prosodic model predicting syllable durations along with syllable-aligned Fujisaki control parameters. Correlations between trained and predicted parameters suggest synergy effects, as they are mostly higher than correlations yielded when predicting parameters individually from the same set of input features using a regression model. Informal listening tests with resynthesis examples showed encouraging results.
Bibliographic reference. Mixdorff, Hansjörg / Jokisch, Oliver (2001): "Building an integrated prosodic model of German", In EUROSPEECH-2001, 947-950.