Speech Prosody 2012

Shanghai, China
May 22-25, 2012

Multi-level Exemplar-Based Duration Generation for Expressive Speech Synthesis

Mohamed Abou-Zleikha, Éva Székely, Peter Cahill, Julie Carson-Berndsen

CNGL, School of Computer Science and Informatics, University College Dublin, Dublin, Ireland

The generation of duration of speech units from linguistic in- formation, as one component of a prosody model, is consid- ered to be a requirement for natural sounding speech synthesis. This paper investigates the use of a multi-level exemplar-based model for duration generation for the purposes of expressive speech synthesis. The multi-level exemplar-based model has been proposed in the literature as a cognitive model for the pro- duction of duration. The implementation of this model for dura- tion generation for speech synthesis is not straightforward and requires a set of modifications to the model and that the linguis- tically related units and the context of the target units should be taken into consideration. The work presented in this paper implements this model and presents a solution to these issues through the use of prosodic-syntactic correlated data, full con- text information of the input example and corpus exemplars.

Index Terms: speech prosody, duration generation, exemplar- based model

