CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion

Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo


Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial network (CycleGAN)-VC and CycleGAN-VC2 have shown promising results regarding this problem and have been widely used as benchmark methods. However, owing to the ambiguity of the effectiveness of CycleGAN-VC/VC2 for mel-spectrogram conversion, they are typically used for mel-cepstrum conversion even when comparative methods employ mel-spectrogram as a conversion target. To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion. Through initial experiments, we discovered that their direct applications compromised the time-frequency structure that should be preserved during conversion. To remedy this, we propose CycleGAN-VC3, an improvement of CycleGAN-VC2 that incorporates time-frequency adaptive normalization (TFAN). Using TFAN, we can adjust the scale and bias of the converted features while reflecting the time-frequency structure of the source mel-spectrogram. We evaluated CycleGAN-VC3 on inter-gender and intra-gender non-parallel VC. A subjective evaluation of naturalness and similarity showed that for every VC pair, CycleGAN-VC3 outperforms or is competitive with the two types of CycleGAN-VC2, one of which was applied to mel-cepstrum and the other to mel-spectrogram.1


 DOI: 10.21437/Interspeech.2020-2280

Cite as: Kaneko, T., Kameoka, H., Tanaka, K., Hojo, N. (2020) CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion. Proc. Interspeech 2020, 2017-2021, DOI: 10.21437/Interspeech.2020-2280.


@inproceedings{Kaneko2020,
  author={Takuhiro Kaneko and Hirokazu Kameoka and Kou Tanaka and Nobukatsu Hojo},
  title={{CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2017--2021},
  doi={10.21437/Interspeech.2020-2280},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2280}
}