Third ESCA/COCOSDA Workshop on Speech Synthesis
November 26-29, 1998
A voice adaptation system enables users to quickly create new voices for a text-to-speech system, allowing for the personalization of the synthesis output. The system adapts to the pitch and spectrum of the target speaker, using a probabilistic, locally linear conversion function based on a Gaussian Mixture Model. Numerical and perceptual evaluations reveal insights into the correlation between adaptation quality and the amount of training data, the number of free parameters. A new joint density estimation algorithm is compared to a previous approach. Numerical errors are studied on the basis of broad phonetic categories. A data augmentation method for training data with incomplete phonetic coverage is investigated and found to maintain high speech quality while partially adapting to the target voice.
Bibliographic reference. Kain, Alexander / Macon, Mike (1998): "Personalizing a Speech Synthesizer by Voice Adaptation", In SSW3-1998, 225-230.