EUROSPEECH 2001 Scandinavia
Voice conversion is a technique for producing utterances using any target speakers' voice from a single source speaker's utterance. In this paper, we apply cross-language voice conversion between Japanese and English to a system based on a Gaussian Mixture Model (GMM) method and STRAIGHT, a high quality vocoder. To investigate the effects of this conversion system across different languages, we recorded two sets of bilingual utterances and performed voice conversion experiments using a mapping function which converts parameters of acoustic features for a source speaker to those of a target speaker. The mapping functions were trained using bilingual databases of both Japanese and English speech. In an objective evaluation using Mel cepstrum distortion (Mel CD), it was confirmed that the system can perform cross-language voice conversion with the same performance as that within a single-language.
Bibliographic reference. Mashimo, Mikiko / Toda, Tomoki / Shikano, Kiyohiro / Campbell, Nick (2001): "Evaluation of cross-language voice conversion based on GMM and straight", In EUROSPEECH-2001, 361-364.