Long Range Acoustic Features for Spoofed Speech Detection

Rohan Kumar Das, Jichen Yang, Haizhou Li

Speaker verification systems in practice are vulnerable to spoofing attacks. The high quality recording and playback devices make replay attack a real threat to speaker verification. Additionally, the furtherance in voice conversion and speech synthesis has produced perceptually natural sounding speech. The ASVspoof 2019 challenge is organized to study the robustness of countermeasures against such attacks, which cover two common modes of attacks, logical and physical access. The former deals with synthetic attacks arising from voice conversion and text-to-speech techniques, whereas the latter deals with replay attacks. In this work, we explore several novel countermeasures based on long range acoustic features that are found to be effective for spoofing attack detection. The long range features capture different aspects of long range information as they are computed from subbands and octave power spectrum in contrast to the conventional way from linear power spectrum. These novel features are combined with the other known features for improved detection of spoofing attacks. We obtain a tandem detection cost function of 0.1264 and 0.1381 (equal error rate 4.13% and 5.95%) for logical and physical access on the best combined system submitted to the challenge.

 DOI: 10.21437/Interspeech.2019-1887

Cite as: Das, R.K., Yang, J., Li, H. (2019) Long Range Acoustic Features for Spoofed Speech Detection. Proc. Interspeech 2019, 1058-1062, DOI: 10.21437/Interspeech.2019-1887.

  author={Rohan Kumar Das and Jichen Yang and Haizhou Li},
  title={{Long Range Acoustic Features for Spoofed Speech Detection}},
  booktitle={Proc. Interspeech 2019},