Self-Supervised Spoofing Audio Detection Scheme

Ziyue Jiang, Hongcheng Zhu, Li Peng, Wenbing Ding, Yanzhen Ren


With the development of deep generation technology, spoofing audio technology based on speech synthesis and speech conversion is closer to reality, which challenges the credibility of the media in social networks. This paper proposes a self-supervised spoofing audio detection scheme(SSAD). In SSAD, eight convolutional blocks are used to capture the local feature of the audio signal. The temporal convolutional network (TCN) is used to capture the context features and realize the operation in parallel. Three regression workers and one binary worker are designed to achieve better performance in fake and spoofing audio detection. The experimental results on ASVspoof 2019 dataset show that the detection accuracy of SSAD outperforms the state-of-art. It shows that the self-supervised method is effective for the task of spoofing audio detection.


 DOI: 10.21437/Interspeech.2020-1760

Cite as: Jiang, Z., Zhu, H., Peng, L., Ding, W., Ren, Y. (2020) Self-Supervised Spoofing Audio Detection Scheme. Proc. Interspeech 2020, 4223-4227, DOI: 10.21437/Interspeech.2020-1760.


@inproceedings{Jiang2020,
  author={Ziyue Jiang and Hongcheng Zhu and Li Peng and Wenbing Ding and Yanzhen Ren},
  title={{Self-Supervised Spoofing Audio Detection Scheme}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4223--4227},
  doi={10.21437/Interspeech.2020-1760},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1760}
}