Third ESCA/COCOSDA Workshop on Speech Synthesis

November 26-29, 1998
Jenolan Caves House, Blue Mountains, NSW, Australia

Generalization and Discrimination in Tree-structured Unit Selection

Michael W. Macon, Andrew E. Cronk, Johan Wouters

Center for Spoken Language Understanding, Oregon Graduate Institute, Portland, OR, USA

Concatenative "selection-based" synthesis from large databases has emerged as a viable framework for TTS waveform generation. Unit selection algorithms attempt to predict the appropriateness of a particular database speech segment using only linguistic features output by text analysis and prosody prediction components of a synthesizer. All of these algorithms have in common a training or "learning" phase in which parameters are trained to select appropriate waveform seg- ments for a given feature vector input. One approach to this step is to partition available data into clusters that can be indexed by linguistic features available at runtime. This method relies critically on two important principles: discrimination of fine phonetic details using a perceptually-motivated distance measure in training and generalization to unseen cases in selection. In this paper, we describe e orts to systematically investigate and improve these parts of the process.

Full Paper

Bibliographic reference.  Macon, Michael W. / Cronk, Andrew E. / Wouters, Johan (1998): "Generalization and Discrimination in Tree-structured Unit Selection", In SSW3-1998, 195-200.