13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

A Study of Mutual Information for GMM-Based Spectral Conversion

Hsin-Te Hwang (1), Yu Tsao (2), Hsin-Min Wang (3), Yih-Ru Wang (1), Sin-Horng Chen (1)

(1) Dept. of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan
(2) Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan
(3) Institute of Information Science, Academia Sinica, Taipei, Taiwan

The Gaussian mixture model (GMM)-based method has dominated the field of voice conversion (VC) for last decade. However, the converted spectra are excessively smoothed and thus produce muffled converted sound. In this study, we improve the speech quality by enhancing the dependency between the source (natural sound) and converted feature vectors (converted sound). It is believed that enhancing this dependency can make the converted sound closer to the natural sound. To this end, we propose an integrated maximum a posteriori and mutual information (MAPMI) criterion for parameter generation on spectral conversion. Experimental results demonstrate that the quality of converted speech by the proposed MAPMI method outperforms that by the conventional method in terms of formal listening test.

Index Terms: Voice conversion, mutual information, GMM.

Full Paper

Audio Examples

15 different sentences: (source) source speaker; (target) target speaker; (proposed) proposed method; (conventional) "conventional" method
(source)   (target)   (proposed)   (conventional)   (1)
(source)   (target)   (proposed)   (conventional)   (2)
(source)   (target)   (proposed)   (conventional)   (3)
(source)   (target)   (proposed)   (conventional)   (4)
(source)   (target)   (proposed)   (conventional)   (5)
(source)   (target)   (proposed)   (conventional)   (6)
(source)   (target)   (proposed)   (conventional)   (7)
(source)   (target)   (proposed)   (conventional)   (8)
(source)   (target)   (proposed)   (conventional)   (9)
(source)   (target)   (proposed)   (conventional)   (10)
(source)   (target)   (proposed)   (conventional)   (11)
(source)   (target)   (proposed)   (conventional)   (12)
(source)   (target)   (proposed)   (conventional)   (13)
(source)   (target)   (proposed)   (conventional)   (14)
(source)   (target)   (proposed)   (conventional)   (15)

Bibliographic reference.  Hwang, Hsin-Te / Tsao, Yu / Wang, Hsin-Min / Wang, Yih-Ru / Chen, Sin-Horng (2012): "A study of mutual information for GMM-based spectral conversion", In INTERSPEECH-2012, 78-81.