GAN-Based Data Generation for Speech Emotion Recognition

Sefik Emre Eskimez, Dimitrios Dimitriadis, Robert Gmyr, Kenichi Kumanati


In this work, we propose a GAN-based method to generate synthetic data for speech emotion recognition. Specifically, we investigate the usage of GANs for capturing the data manifold when the data is eyes-off, i.e., where we can train networks using the data but cannot copy it from the clients. We propose a CNN-based GAN with spectral normalization on both the generator and discriminator, both of which are pre-trained on large unlabeled speech corpora. We show that our method provides better speech emotion recognition performance than a strong baseline. Furthermore, we show that even after the data on the client is lost, our model can generate similar data that can be used for model bootstrapping in the future. Although we evaluated our method for speech emotion recognition, it can be applied to other tasks.


 DOI: 10.21437/Interspeech.2020-2898

Cite as: Eskimez, S.E., Dimitriadis, D., Gmyr, R., Kumanati, K. (2020) GAN-Based Data Generation for Speech Emotion Recognition. Proc. Interspeech 2020, 3446-3450, DOI: 10.21437/Interspeech.2020-2898.


@inproceedings{Eskimez2020,
  author={Sefik Emre Eskimez and Dimitrios Dimitriadis and Robert Gmyr and Kenichi Kumanati},
  title={{GAN-Based Data Generation for Speech Emotion Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3446--3450},
  doi={10.21437/Interspeech.2020-2898},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2898}
}