Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

Improved Average-Voice-based Speech Synthesis using Gender-Mixed Modeling and a Parameter Generation Algorithm considering GV

Junichi Yamagishi (1), Takao Kobayashi (2), Steve Renals (1), Simon King (1), Heiga Zen (3), Tomoki Toda (4), Keiichi Tokuda (3)

(1) University of Edinburgh, UK; (2) Tokyo Institute of Technology, Japan; (3) Nagoya Institute of Technology, Japan; (4) Nara Institute of Science and Technology, Japan

For constructing a speech synthesis system which can achieve diverse voices, we have been developing a speaker independent approach of HMM-based speech synthesis in which statistical average voice models are adapted to a target speaker using a small amount of speech data. In this paper, we incorporate a high-quality speech vocoding method STRAIGHT and a parameter generation algorithm with global variance into the system for improving quality of synthetic speech. Furthermore, we introduce a feature-space speaker adaptive training algorithm and a gender mixed modeling technique for conducting further normalization of the average voice model. We build an English text-to-speech system using these techniques and show the performance of the system.

Full Paper

Bibliographic reference.  Yamagishi, Junichi / Kobayashi, Takao / Renals, Steve / King, Simon / Zen, Heiga / Toda, Tomoki / Tokuda, Keiichi (2007): "Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV", In SSW6-2007, 125-130.