13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

An HMM Approach to Residual Estimation for High Resolution Voice Conversion

Winston Percybrooks (1,2), Elliot Moore (1)

(1) School of Electrical and Computer Engineering, Georgia Institute of Technology, Savannah, GA, USA
(2) Department of Electrical and Electronics Engineering, Universidad del Norte, Barranquilla, Colombia

Voice conversion systems aim to process speech from a source speaker so it would be perceived as spoken by a target speaker. This paper presents a procedure to improve high resolution voice conversion by modifying the algorithm used for residual estimation. The proposed residual estimation algorithm exploits the temporal dependencies between residuals in consecutive speech frames using a hidden Markov model. A previous residual estimation technique based on Gaussian mixtures is used as comparison. Both algorithms are subjected to tests to measure perceived identity conversion and converted speech quality. It was found that the proposed algorithm generates converted speech with significantly better quality without degraded identity conversion performance with respect to the baseline, working particularly well for female target speakers and cross-gender conversions.

Index Terms: Voice conversion, residual estimation, HMM, MOS test, ABX test

Full Paper

Bibliographic reference.  Percybrooks, Winston / Moore, Elliot (2012): "An HMM approach to residual estimation for high resolution voice conversion", In INTERSPEECH-2012, 90-93.