Third ESCA/COCOSDA Workshop on Speech Synthesis

November 26-29, 1998
Jenolan Caves House, Blue Mountains, NSW, Australia

Personalizing a Speech Synthesizer by Voice Adaptation

Alexander Kain, Mike Macon

Center for Spoken Language Understanding (CSLU), Oregon Graduate Institute of Science and Technology, Portland, OR, USA

A voice adaptation system enables users to quickly create new voices for a text-to-speech system, allowing for the personalization of the synthesis output. The system adapts to the pitch and spectrum of the target speaker, using a probabilistic, locally linear conversion function based on a Gaussian Mixture Model. Numerical and perceptual evaluations reveal insights into the correlation between adaptation quality and the amount of training data, the number of free parameters. A new joint density estimation algorithm is compared to a previous approach. Numerical errors are studied on the basis of broad phonetic categories. A data augmentation method for training data with incomplete phonetic coverage is investigated and found to maintain high speech quality while partially adapting to the target voice.

Full Paper

Bibliographic reference.  Kain, Alexander / Macon, Mike (1998): "Personalizing a Speech Synthesizer by Voice Adaptation", In SSW3-1998, 225-230.