Third ESCA/COCOSDA Workshop on Speech Synthesis

November 26-29, 1998
Jenolan Caves House, Blue Mountains, NSW, Australia

Removing Phase Mismatches in Concatenative Speech Synthesis

Yannis Stylianou

AT&T Laboratories - Research, Florham Park, NJ, USA

Concatenation of acoustic units is widely used in most of the currently available text-to-speech systems. While this approach leads to higher intelligibility and naturalness than synthesis-by-rule, it has to cope with the issues of concatenating acoustic units that have been recorded in a different order. One important issue in concatenation is that of synchronization of speech frames or, in other words, inter-frame coherence. This paper presents a novel method for synchronization of signals with applications to speech synthesis. The method is based on the notion of center of gravity applied to speech signals. It is an o -line approach as this can be done during analysis with no computational burden on synthesis. The method has been tested with the Harmonic plus Noise Model, HNM, on many large speech databases. The resulting synthetic speech is free of phase mismatch (inter-frame incoherence) problems.

Full Paper (with 2 sound examples linked from within the paper)

Bibliographic reference.  Stylianou, Yannis (1998): "Removing phase mismatches in concatenative speech synthesis", In SSW3-1998, 267-272.