A Deep Identity Representation for Noise Robust Spoofing Detection

Alejandro Gómez Alanís, Antonio M. Peinado, Jose A. Gonzalez, Angel Gomez

The issue of the spoofing attacks which may affect automatic speaker verification systems (ASVs) has recently received an increased attention, so that a number of countermeasures have been developed for detecting high technology attacks such as speech synthesis and voice conversion. However, the performance of anti-spoofing systems degrades significantly in noisy conditions. To address this issue, we propose a deep learning framework to extract spoofing identity vectors, as well as the use of soft missing-data masks. The proposed feature extraction employs a convolutional neural network (CNN) plus a recurrent neural network (RNN) in order to provide a single deep feature vector per utterance. Thus, the CNN is treated as a convolutional feature extractor that operates at the frame level. On top of the CNN outputs, the RNN is employed to obtain a single spoofing identity representation of the whole utterance. Experimental evaluation is carried out on both a clean and a noisy version of the ASVSpoof2015 corpus. The experimental results show that our proposals clearly outperforms other methods recently proposed such as the popular CQCC+GMM system or other similar deep feature systems for both seen and unseen noisy conditions.

 DOI: 10.21437/Interspeech.2018-1909

Cite as: Gómez Alanís, A., Peinado, A.M., Gonzalez, J.A., Gomez, A. (2018) A Deep Identity Representation for Noise Robust Spoofing Detection. Proc. Interspeech 2018, 676-680, DOI: 10.21437/Interspeech.2018-1909.

@inproceedings{Gómez Alanís2018,
  author={Alejandro {Gómez Alanís} and Antonio M. Peinado and Jose A. Gonzalez and Angel Gomez},
  title={A Deep Identity Representation for Noise Robust Spoofing Detection},
  booktitle={Proc. Interspeech 2018},