4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
A dynamic model for synthesizing intonation is presented. This model is based on the following assumptions: intonation is the result of superposed and independent prototypical gestures belonging to diverse linguistic levels: sentence, clause, group, subgroup... Prototypical movements are progressively stored in a prosodic lexicon and used by the speaker in given communication tasks. Our recent application of this model is an association of sequential neural networks (SNNs). Each dynamic module is in charge of the melodic prediction of a specific linguistic level. The resulting melody is the weighted sum of SNNs outputs. We presently focused on the sentence level. We built a corpus consisting of various length utterances pronounced with 6 attitudes. We then designed SNNs able to perform the expansion of prosodic sentence movement. Preliminary results show that these simple SNNs can give acceptable F0 prediction and keep essential features of each attitude whatever the syllabic length of the sentence.
Bibliographic reference. Morlec, Yann / Bailly, Gérard / Aubergé, Vèronique (1996): "Generating intonation by superposing gestures", In ICSLP-1996, 283-286.