Fourth ISCA ITRW on Speech Synthesis

August 29 - September 1, 2001
Perthshire, Scotland

A selection/concatenation text-to-speech synthesis system: databases development, system design, comparative evaluation

Romain Prudon and Christophe d'Alessandro

LIMSI-CNRS, Orsay, France

This paper describes the development of a new text-to-speech synthesis system in French. The system is based on selection and concatenation of natural speech segments, taken in large annotated speech data bases. In a first part the databases design, content and annotation procedures are presented. It appeared that about 1 hour speech databases are large enough for building a TTS system. In a second part, the system architecture is described, A key feature of the present system is that only 4 simple and efficient selection criteria are proposed. A formal comparative evaluation procedure is described in the third part. The experiments show that the new system is preferred along all the evaluation categories to the previous system, which is based on diphone concatenation and synthesis by rules of the prosody. The most significant improvements brought by the new system seems to be for voice pleasantness and overall impression.

Full Paper

Bibliographic reference.  Prudon, Romain / Alessandro, Christophe d' (2001): "A selection/concatenation text-to-speech synthesis system: databases development, system design, comparative evaluation", In SSW4-2001, paper 138.