EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Evaluation of Cross-Language Voice Conversion Based on GMM and Straight

Mikiko Mashimo (1), Tomoki Toda (1), Kiyohiro Shikano (1), Nick Campbell (2)

(1) Nara Institute of Science and Technology, Japan
(2) ATR Information Sciences Division, Japan

Voice conversion is a technique for producing utterances using any target speakers' voice from a single source speaker's utterance. In this paper, we apply cross-language voice conversion between Japanese and English to a system based on a Gaussian Mixture Model (GMM) method and STRAIGHT, a high quality vocoder. To investigate the effects of this conversion system across different languages, we recorded two sets of bilingual utterances and performed voice conversion experiments using a mapping function which converts parameters of acoustic features for a source speaker to those of a target speaker. The mapping functions were trained using bilingual databases of both Japanese and English speech. In an objective evaluation using Mel cepstrum distortion (Mel CD), it was confirmed that the system can perform cross-language voice conversion with the same performance as that within a single-language.

Full Paper

Bibliographic reference.  Mashimo, Mikiko / Toda, Tomoki / Shikano, Kiyohiro / Campbell, Nick (2001): "Evaluation of cross-language voice conversion based on GMM and straight", In EUROSPEECH-2001, 361-364.