13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Cross-lingual Speaker Adaptation for HMM-based Speech Synthesis based on Perceptual Characteristics and Speaker Interpolation

Viviane de Franca Oliveira, Sayaka Shiota, Yoshihiko Nankaku, Keiichi Tokuda

Department of Computer Science and Engineering, Nagoya Institute of Technology, Nagoya, Japan

The language mapping performed in a cross-lingual speaker adaptation task may not produce sufficient results if a bilingual database is not available. In order to overcome this problem, this work proposes a new method in which a correspondence between speakers in two different databases, speaking different languages, is established based on the perceptual characteristics of their voices. The proposed approach uses a language-independent space of voice characteristics obtained by performing subjective listening tests. This new space is used in the speaker adaptation process, making it possible to represent the input speaker in a different language while keeping his/her voice characteristics, without a bilingual database. Furthermore, the method is potentially able to adapt the prosodic information from the target speaker, such as long-term changes in F0 and durations. From the evaluation listening tests, we confirmed that the proposed framework generates speech that sounds similar to the target speaker voice, with better speech quality than the previously proposed method.

Index Terms: HMM-based speech synthesis, cross-lingual speaker adaptation, eigenvoices, perceptual characteristics, speaker interpolation

