4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Training Data Selection for Voice Conversion Using Speaker Selection and Vector Field Smoothing

Makoto Hashimoto, Norio Higuchi

ATR Interpreting Telecommunications Research Labs., Soraku-gun, Kyoto, Japan

We have previously proposed a spectral mapping method (SSVFS), for the purpose of voice conversion with a small amount of training data using speaker selection and vector field smoothing techniques. It has already been shown that SSVFS is effective for spectral mapping by both objective and subjective evaluations, and that it can operate with a very small amount of training data - as little as only one word [1]. This paper proposes a criterion for selecting effective training data for SSVFS. We defined coverage of parameter space with respect to training procedure of SSVFS as the criterion. This criterion is useful not only for the selection of effective training samples, which is important for the efficient learning of spectral characteristics, but also for the estimation of the degree to which learning is carried out. To evaluate the validity of the proposed criterion, we measured the correlation between spectral resemblance and coverage. The result showed that the mean correlation coefficient for eight target speakers is -0.74 with the proposed criterion, and -0.59 without consideration of the training procedure. We conclude that the proposed criterion is useful in selecting effective training samples for SSVFS.

Full Paper

Bibliographic reference.  Hashimoto, Makoto / Higuchi, Norio (1996): "Training data selection for voice conversion using speaker selection and vector field smoothing", In ICSLP-1996, 1397-1400.