x-Vectors Meet Adversarial Attacks: Benchmarking Adversarial Robustness in Speaker Verification

Jesús Villalba, Yuekai Zhang, Najim Dehak


Automatic Speaker Verification (ASV) enables high-security applications like user authentication or criminal investigation. However, ASV can be subjected to malicious attacks, which could compromise that security. The ASV literature mainly studies spoofing (a.k.a impersonation) attacks such as voice replay, synthesis or conversion. Meanwhile, other kinds of attacks, known as adversarial attacks, have become a threat to all kind of machine learning systems. Adversarial attacks introduce an imperceptible perturbation in the input signal that radically changes the behavior of the system. These attacks have been intensively studied in the image domain but less in the speech domain.

In this work, we investigate the vulnerability of state-of-the-art ASV systems to adversarial attacks. We consider a threat model consisting in adding a perturbation noise to the test waveform to alter the ASV decision. We also discuss the methodology and metrics to benchmark adversarial attacks and defenses in ASV. We evaluated three x-vector architectures, which performed among the best in recent ASV evaluations, against fast gradient sign and Carlini-Wagner attacks. All networks were highly vulnerable in the white-box attack scenario, even for high SNR (30–60 dB). Furthermore, we successfully transferred attacks generated with smaller white-box networks to attack a larger black-box network.


 DOI: 10.21437/Interspeech.2020-2458

Cite as: Villalba, J., Zhang, Y., Dehak, N. (2020) x-Vectors Meet Adversarial Attacks: Benchmarking Adversarial Robustness in Speaker Verification. Proc. Interspeech 2020, 4233-4237, DOI: 10.21437/Interspeech.2020-2458.


@inproceedings{Villalba2020,
  author={Jesús Villalba and Yuekai Zhang and Najim Dehak},
  title={{x-Vectors Meet Adversarial Attacks: Benchmarking Adversarial Robustness in Speaker Verification}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4233--4237},
  doi={10.21437/Interspeech.2020-2458},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2458}
}