Deep Siamese Architecture Based Replay Detection for Secure Voice Biometric

Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah

Replay attacks are the simplest and the most easily accessible form of spoofing attacks on voice biometric systems and can be hard to detect by systems designed to identify spoofing attacks based on synthesised speech. In this paper, we propose a novel approach to evaluate the similarities between pairs of speech samples to detect replayed speech based on a suitable embedding learned by deep Siamese architectures. Specifically, we train a deep Siamese network to identify pairs of genuine speech samples and pairs of replayed speech samples as being ‘similar’ and mixed pairs of genuine and replayed speech to be identified as ‘dissimilar’. Siamese networks are particularly suited to this task and have been shown to be effective in problems where intra-class variability is large and the number of training samples per class is relatively small. The internal low-dimensional embedding learnt by the Siamese network to accomplish this task is then used as the basis for replay detection. The proposed approach outperforms state-of-the-art systems when evaluated on the ASVspoof 2017 challenge corpus without relying on fusion with other sub-systems.

 DOI: 10.21437/Interspeech.2018-1819

Cite as: Sriskandaraja, K., Sethu, V., Ambikairajah, E. (2018) Deep Siamese Architecture Based Replay Detection for Secure Voice Biometric. Proc. Interspeech 2018, 671-675, DOI: 10.21437/Interspeech.2018-1819.

  author={Kaavya Sriskandaraja and Vidhyasaharan Sethu and Eliathamby Ambikairajah},
  title={Deep Siamese Architecture Based Replay Detection for Secure Voice Biometric},
  booktitle={Proc. Interspeech 2018},