Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Eigenvoice Conversion Based on Gaussian Mixture Model

Tomoki Toda, Yamato Ohtani, Kiyohiro Shikano

ara Institute of Science & Technology, Japan

This paper describes a novel framework of voice conversion (VC). We call it eigenvoice conversion (EVC). We apply EVC to the conversion from a source speakerís voice to arbitrary target speakersí voices. Using multiple parallel data sets consisting of utterance-pairs of the source and multiple pre-stored target speakers, a canonical eigenvoice GMM (EV-GMM) is trained in advance. That conversion model enables us to flexibly control the speaker individuality of the converted speech by manually setting weight parameters. In addition, the optimum weight set for a specific target speaker is estimated using only speech data of the target speaker without any linguistic restrictions. We evaluate the performance of EVC by a spectral distortion measure. Experimental results demonstrate that EVC works very well even if we use only a few utterances of the target speaker for the weight estimation.

Full Paper

Bibliographic reference.  Toda, Tomoki / Ohtani, Yamato / Shikano, Kiyohiro (2006): "Eigenvoice conversion based on Gaussian mixture model", In INTERSPEECH-2006, paper 1717-Thu2A3O.5.