5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Automatically Clustering Similar Units for Unit Selection in Speech Synthesis

Alan W. Black, Paul Taylor

Centre for Speech Technology Research, University of Edinburgh, Scotland, UK

This paper describes a new method for synthesizing speech by concatenating sub-word units from a database of labelled speech. A large unit inventory is created by automatically clustering units of the same phone class based on their phonetic and prosodic context. The appropriate cluster is then selected for a target unit offering a small set of candidate units. An optimal path is found through the candidate units based on their distance from the cluster center and an acoustically based join cost. Details of the method and justification are presented. The results of experiments using two different databases are given, optimising various parameters within the system. Also a comparison with other existing selection based synthesis techniques is given showing the advantages this method has over existing ones. The method is implemented within a full text-to-speech system offering efficient natural sounding speech synthesis.

Full Paper

Bibliographic reference.  Black, Alan W. / Taylor, Paul (1997): "Automatically clustering similar units for unit selection in speech synthesis", In EUROSPEECH-1997, 601-604.