An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity

Dong-Yan Huang, Lei Xie, Yvonne Siu Wa Lee, Jie Wu, Huaiping Ming, Xiaohai Tian, Shaofei Zhang, Chuang Ding, Mei Li, Quy Hy Nguyen, Minghui Dong, Haizhou LI


Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted considerable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversion approaches, usually through time-intensive subject listening tests, requires a huge amount of human labor. This paper proposes an automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity. Experimental results show that our automatic evaluation results match the subjective listening results quite well. We further use our strategy to select best converted samples from multiple voice conversion systems and our submission achieves promising results in the voice conversion challenge (VCC2016).


DOI: 10.21437/SSW.2016-8

Cite as

Huang, D., Xie, L., Lee, Y.S.W., Wu, J., Ming, H., Tian, X., Zhang, S., Ding, C., Li, M., Nguyen, Q.H., Dong, M., LI, H. (2016) An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity. Proc. 9th ISCA Speech Synthesis Workshop, 44-51.

Bibtex
@inproceedings{Huang+2016,
author={Dong-Yan Huang and Lei Xie and Yvonne Siu Wa Lee and Jie Wu and Huaiping Ming and Xiaohai Tian and Shaofei Zhang and Chuang Ding and Mei Li and Quy Hy Nguyen and Minghui Dong and Haizhou LI},
title={An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity},
year=2016,
booktitle={9th ISCA Speech Synthesis Workshop},
doi={10.21437/SSW.2016-8},
url={http://dx.doi.org/10.21437/SSW.2016-8},
pages={44--51}
}