Adversarial Regularization for End-to-End Robust Speaker Verification

Qing Wang, Pengcheng Guo, Sining Sun, Lei Xie, John H.L. Hansen

Deep learning has been successfully used in speaker verification (SV), especially in end-to-end SV systems which have attracted more interest recently. It has been shown in image as well as speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we explore two methods to generate adversarial examples for advanced SV: (i) fast gradient-sign method (FGSM), and (ii) local distributional smoothness (LDS) method. To explore this issue, we use adversarial examples to attack an end-to-end SV system. Experiments will show that the neural network can be easily disturbed by adversarial examples. Next, we propose to train an end-to-end robust SV model using the two proposed adversarial examples for model regularization. Experimental results with the TIMIT dataset indicate that the EER is improved relatively by (i) +18.89% and (ii) +5.54% for the original test set using the regularized model. In addition, the regularized model improves EER of the adversarial example test set by a relative (i) +30.11% and (ii) +22.12%, which therefore suggests more consistent performance against adversarial example attacks.

 DOI: 10.21437/Interspeech.2019-2983

Cite as: Wang, Q., Guo, P., Sun, S., Xie, L., Hansen, J.H. (2019) Adversarial Regularization for End-to-End Robust Speaker Verification. Proc. Interspeech 2019, 4010-4014, DOI: 10.21437/Interspeech.2019-2983.

  author={Qing Wang and Pengcheng Guo and Sining Sun and Lei Xie and John H.L. Hansen},
  title={{Adversarial Regularization for End-to-End Robust Speaker Verification}},
  booktitle={Proc. Interspeech 2019},