EUROSPEECH 2001 Scandinavia
A dynamical model of rhythm production is presented. The model is meant to generate segmental duration from the interplay between a dynamical rhythmic system and a gestural score representation. The rhythmic level is being implemented by a coupled-oscillator system (composed by a syllabic and a phrase stress oscillator) that delivers V-to- V-size beats to the gestural score. The model is able to automatically generate segment and pause acoustic duration according to speech rate input. The coupling of both oscillators as well as the interaction between the rhythmic system and a linguistic description of sentences is achieved by a recurrent neural network. The network delivers syllable-size normalized durations, which are then statistically distributed among the segments. The model exhibits cognitively plausible language universal and language-specific phonetic properties that are in complete disagreement with output-oriented techniques of speech generation which do not take into account the underlying speech production mechanism.
Bibliographic reference. Barbosa, Plínio A. (2001): "Generating duration from a cognitively plausible model of rhythm production", In EUROSPEECH-2001, 967-970.