Novel Pre-processing using Outlier Removal in Voice Conversion

Sushant V. Rao, Nirmesh J Shah, Hemant A. Patil

Voice conversion (VC) technique modifies the speech utterance spoken by a source speaker to make it sound like a target speaker is speaking. Gaussian Mixture Model (GMM)-based VC is a state-of-the-art method. It finds the mapping function by modeling the joint density of source and target speakers using GMM to convert spectral features framewise. As with any real dataset, the spectral parameters contain a few points that are inconsistent with the rest of the data, called outliers. Until now, there has been very few literature regarding the effect of outliers in voice conversion. In this paper, we have explored the effect of outliers in voice conversion, as a pre-processing step. In order to remove these outliers, we have used the score distance, which uses the scores estimated using Robust Principal Component Analysis (ROBPCA). The outliers are determined by using a cut-off value based on the degrees of freedom in a chi-squared distribution. They are then removed from the training dataset and a GMM is trained based on the least outlying points. This pre-processing step can be applied to various methods. Experimental results indicate that there is a clear improvement in both, the objective (8 %) as well as the subjective (4 % for MOS and 5 % for XAB) results.

DOI: 10.21437/SSW.2016-22

Cite as

Rao, S.V., Shah, N.J., Patil, H.A. (2016) Novel Pre-processing using Outlier Removal in Voice Conversion. Proc. 9th ISCA Speech Synthesis Workshop, 134-139.

author={Sushant V. Rao and Nirmesh J Shah and Hemant A. Patil},
title={Novel Pre-processing using Outlier Removal in Voice Conversion},
booktitle={9th ISCA Speech Synthesis Workshop},