Complex-Valued Variational Autoencoder: A Novel Deep Generative Model for Direct Representation of Complex Spectra

Toru Nakashika


In recent years, variational autoencoders (VAEs) have been attracting interest for many applications and generative tasks. Although the VAE is one of the most powerful deep generative models, it still has difficulty representing complex-valued data such as the complex spectra of speech. In speech synthesis, we usually use the VAE to encode Mel-cepstra, or raw amplitude spectra, from a speech signal into normally distributed latent features and then synthesize the speech from the reconstruction by using the Griffin-Lim algorithm or other vocoders. Such inputs are originally calculated from complex spectra but lack the phase information, which leads to degradation when recovering speech. In this work, we propose a novel generative model to directly encode the complex spectra by extending the conventional VAE. The proposed model, which we call the complex-valued VAE (CVAE), consists of two complex-valued neural networks (CVNNs) of an encoder and a decoder. In the CVAE, not only the inputs and the parameters of the encoder and decoder but also the latent features are defined as complex-valued to preserve the phase information throughout the network. The results of our speech encoding experiments demonstrated the effectiveness of the CVAE compared to the conventional VAE in both objective and subjective criteria.


 DOI: 10.21437/Interspeech.2020-1964

Cite as: Nakashika, T. (2020) Complex-Valued Variational Autoencoder: A Novel Deep Generative Model for Direct Representation of Complex Spectra. Proc. Interspeech 2020, 2002-2006, DOI: 10.21437/Interspeech.2020-1964.


@inproceedings{Nakashika2020,
  author={Toru Nakashika},
  title={{Complex-Valued Variational Autoencoder: A Novel Deep Generative Model for Direct Representation of Complex Spectra}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2002--2006},
  doi={10.21437/Interspeech.2020-1964},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1964}
}