First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

A Minimum Distortion Spectral Mapping Applied to Voice Quality Conversion

Hiroshi Matsumoto, Hirowo Inoue

Dept. of Electrical and Electronic Eng., Faculty of Eng., Shinshu University, Nagano, Japan

This paper presents both supervised and unsupervised spectral mapping methods between speakers, and an application to voice conversion based on PSE analysis-synthesis system. In both mapping methods, a spectrum of one speaker is converted to that of another speaker by interpolating the estimated speaker difference vectors at given points in the spectral space. In the supervised method, these speaker difference vectors are estimated minimizing the spectral distortion along the DTW path between the mapped and target spectral sequences for training samples. In the unsupervised method, after mapping the spectral codebook of the target speaker onto the source speaker based on a minimum fuzzy objective function, the source spectra are converted by a fuzzy mapping using the mapped and target code-books. A voice conversion experiment showed that, in male-to-male conversion, both methods attained an average correct score of 84 % in speaker discrimination for the converted voices, and that, in male-to-female conversion, an average correct score of 70 % was obtained for the supervised method.

Full Paper

Bibliographic reference.  Matsumoto, Hiroshi / Inoue, Hirowo (1990): "A minimum distortion spectral mapping applied to voice quality conversion", In ICSLP-1990, 161-164.