Novel Empirical Mode Decomposition Cepstral Features for Replay Spoof Detection

Prasad Tapkir, Hemant Patil

The advances in Automatic Speaker Verification (ASV) system for voice biometric purpose comes with the danger of spoofing attacks. The replay attack is the most accessible attack, where the attacker imitates speaker’s identity by replaying the pre-recorded speech samples of the target speaker. Most of the conventional features, such as Mel Frequency Cepstral Coefficients (MFCC), Instantaneous Frequency Cepstral Coefficients (IFCC), etc. uses filterbank structure for feature extraction purpose. In this paper, we propose a novel Empirical Mode Decomposition Cepstral Coefficient (EMDCC) feature set, where the filterbank in MFCC is replaced with the Empirical Mode Decomposition (EMD) to obtain the subband signals. The proposed feature set takes an advantage of using EMD that acts as a dyadic filterbank and handles the nonlinear and non-stationary nature of the speech signal. The stand-alone EMDCC feature set gives the Equal Error Rate (EER) of 28.06% compared to the baseline CQCC and MFCC system with EER of 29.18% and 31.3%, respectively on the evaluation set of ASV Spoof 2017 Challenge database. Furthermore, the proposed feature set is fused with the Linear Frequency Modified Group Delay Cepstral Coefficient (LFMGDCC) at score-level and we obtain a reduced EER of 18.36% on evaluation set.

 DOI: 10.21437/Interspeech.2018-1661

Cite as: Tapkir, P., Patil, H. (2018) Novel Empirical Mode Decomposition Cepstral Features for Replay Spoof Detection. Proc. Interspeech 2018, 721-725, DOI: 10.21437/Interspeech.2018-1661.

  author={Prasad Tapkir and Hemant Patil},
  title={Novel Empirical Mode Decomposition Cepstral Features for Replay Spoof Detection},
  booktitle={Proc. Interspeech 2018},