Third ESCA/COCOSDA Workshop on Speech Synthesis

November 26-29, 1998
Jenolan Caves House, Blue Mountains, NSW, Australia

Speaker Adaptation for HMM-based Speech Synthesis System Using MLLR

Masatsune Tamura (1), Takashi Masuko (1), Keiichi Tokuda (2), Takao Kobayashi (1)

(1) Tokyo Institute of Technology, Yokohama, Japan
(2) Nagoya Institute of Technology, Nagoya, Japan

This paper describes a voice characteristics conversion technique for an HMM-based text-to-speech synthesis system. The system uses phoneme HMMs as the speech synthesis units, and voice characteristics conversion is achieved by changing HMM parameters appropriately. To transform the voice characteristics of synthetic speech to the target speaker, we apply anMLLR (Maximum Likelihood Linear Regression) technique, one of the speaker adaptation techniques, to the system. From the results of objective and subjective tests, it is shown that the characteristics of synthetic speech is close to target speakerís voice, and the speech generated from the adapted model set using 5 sentences has almost the same DMOS score as that from the speaker dependent model set.

Full Paper (with 4 sound examples linked from within the paper)

