Novel Linear Frequency Residual Cepstral Features for Replay Attack Detection

Hemlata Tak, Hemant Patil

Replay attack poses the most difficult challenge for the development of countermeasures for spoofed speech detection (SSD) system. Earlier researchers mainly used vocal tract-based (segmental) information for replay detection. However, during replay, excitation source-based information also gets affected (in particular, degradation in pitch source harmonics at higher frequency regions) due to recording environment and replay devices. Hence, in addition to the vocal tract-based system information, we have also explored the excitation source-based informations for SSD. In particular, we have used Linear Frequency Residual Cepstral Coefficients (LFRCC) for replay detection. The objective of this paper is to explore possible complementary excitation (glottal) source information present in the Linear Prediction residual-based features. Experiments performed on the ASV Spoof 2017 Challenge database with Gaussian Mixture Model (GMM) and Convolutional Neural Network (CNN) classifiers. When we combined the source and system-based information, we obtained on an average 28.77% and 42.72% relative decrease in Equal Error Rate (EER) on development and evaluation set, respectively. Furthermore, when we perform score-level fusion of feature sets (for a fixed classifier) followed by a classifier-level fusion of GMM and CNN (for a fixed feature set), we obtained reduced EER of 2.40% and 9.06% on dev and eval set, respectively.

 DOI: 10.21437/Interspeech.2018-1702

Cite as: Tak, H., Patil, H. (2018) Novel Linear Frequency Residual Cepstral Features for Replay Attack Detection. Proc. Interspeech 2018, 726-730, DOI: 10.21437/Interspeech.2018-1702.

  author={Hemlata Tak and Hemant Patil},
  title={Novel Linear Frequency Residual Cepstral Features for Replay Attack Detection},
  booktitle={Proc. Interspeech 2018},