Adversarial Separation Network for Speaker Recognition

Hanyi Zhang, Longbiao Wang, Yunchun Zhang, Meng Liu, Kong Aik Lee, Jianguo Wei

Deep neural networks (DNN) have achieved great success in speaker recognition systems. However, it is observed that DNN based systems are easily deceived by adversarial examples leading to wrong predictions. Adversarial examples, which are generated by adding purposeful perturbations on natural examples, pose a serious security threat. In this study, we propose the adversarial separation network ( AS-Net) to protect the speaker recognition system against adversarial attacks. Our proposed AS-Net is featured by its ability to separate adversarial perturbation from the test speech to restore the natural clean speech. As a standalone component, each input speech is pre-processed by AS-Net first. Furthermore, we incorporate the compression structure and the speaker quality loss to enhance the capacity of the AS-Net. Experimental results on the VCTK dataset demonstrated that the AS-Net effectively enhanced the robustness of speaker recognition systems against adversarial examples. It also significantly outperformed other state-of-the-art adversarial-detection mechanisms, including adversarial perturbation elimination network (APE-GAN), feature squeezing, and adversarial training.

 DOI: 10.21437/Interspeech.2020-1966

Cite as: Zhang, H., Wang, L., Zhang, Y., Liu, M., Lee, K.A., Wei, J. (2020) Adversarial Separation Network for Speaker Recognition. Proc. Interspeech 2020, 951-955, DOI: 10.21437/Interspeech.2020-1966.

  author={Hanyi Zhang and Longbiao Wang and Yunchun Zhang and Meng Liu and Kong Aik Lee and Jianguo Wei},
  title={{Adversarial Separation Network for Speaker Recognition}},
  booktitle={Proc. Interspeech 2020},