Blind Channel Response Estimation for Replay Attack Detection

Anderson R. Avila, Jahangir Alam, Douglas O’Shaughnessy, Tiago H. Falk

Recently, automatic speaker verification (ASV) systems have been acknowledged to be vulnerable to replay attacks. Multiple efforts have been taken by the research community to improve ASV robustness. In this paper, we propose a replay attack countermeasure based on the blind estimation of the magnitude of channel responses. For that, the log-spectrum average of the clean speech signal is predicted from a Gaussian mixture model (GMM) of RASTA filtered mel-frequency cepstral coefficients (MFCCs) trained on clean speech. The magnitude response of the channel is obtained by subtracting the log-spectrum of the observed signal from the predicted log-spectrum average of the clean signal. Two datasets are used in our experiments: (1) the TIMIT dataset, which is used to train the log-spectrum average of the clean signal; and (2) a dataset containing replay attacks used during the second Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2017). Performance is compared to two benchmarks. The discrete Fourier transform power spectral (DFTspec) and the constant Q cepstral coefficients (CQCCs). Results show the proposed method outperforming the two benchmarks in most scenarios with equal error rate (EER) as low as 6.87% when testing on the development set and as low as 11.28% on the evaluation set.

 DOI: 10.21437/Interspeech.2019-2956

Cite as: Avila, A.R., Alam, J., O’Shaughnessy, D., Falk, T.H. (2019) Blind Channel Response Estimation for Replay Attack Detection. Proc. Interspeech 2019, 2893-2897, DOI: 10.21437/Interspeech.2019-2956.

  author={Anderson R. Avila and Jahangir Alam and Douglas O’Shaughnessy and Tiago H. Falk},
  title={{Blind Channel Response Estimation for Replay Attack Detection}},
  booktitle={Proc. Interspeech 2019},