Contrastive Predictive Coding of Audio with an Adversary

Luyu Wang, Kazuya Kawakami, Aaron van den Oord

With the vast amount of audio data available, powerful sound representations can be learned with self-supervised methods even in the absence of explicit annotations. In this work we investigate learning general audio representations directly from raw signals using the Contrastive Predictive Coding objective. We further extend it by leveraging ideas from adversarial machine learning to produce additive perturbations that effectively makes the learning harder, such that the predictive tasks will not be distracted by trivial details. We also look at the effects of different design choices for the objective, including the nonlinear similarity measure and the way the negatives are drawn. Combining these contributions our models are able to considerably outperform previous spectrogram-based unsupervised methods. On AudioSet we observe a relative improvement of 14% in mean average precision over the state of the art with half the size of the training data.

 DOI: 10.21437/Interspeech.2020-1891

Cite as: Wang, L., Kawakami, K., Oord, A.V.D. (2020) Contrastive Predictive Coding of Audio with an Adversary. Proc. Interspeech 2020, 826-830, DOI: 10.21437/Interspeech.2020-1891.

  author={Luyu Wang and Kazuya Kawakami and Aaron van den Oord},
  title={{Contrastive Predictive Coding of Audio with an Adversary}},
  booktitle={Proc. Interspeech 2020},