Siamese Convolutional Neural Network Using Gaussian Probability Feature for Spoofing Speech Detection

Zhenchun Lei, Yingen Yang, Changhong Liu, Jihua Ye


The security and reliability of automatic speaker verification systems can be threatened by different types of spoofing attacks using speech synthetic, voice conversion, or replay. The 2-class Gaussian Mixture Model classifier for genuine and spoofed speech is usually used as the baseline in the ASVspoof challenge, which is designed to develop the generalized countermeasures with potential to detect varying and unforeseen spoofing attacks. In the scoring phase, the GMM accumulates the scores on all frames in a test speech independently, and does not consider the relationship between adjacent frames. We propose the 1-D Convolutional Neural Network whose input is the log-probabilities of the speech frames on the GMM components. The new model considers not only the score distribution of GMM components, but also the local relationship of frames. And the pooling is used to extract the speech global character. The Siamese CNN is also proposed, which is based on two GMMs trained on genuine and spoofed speech respectively. Experiments on the ASVspoof 2019 challenge logical and physical access scenarios show that the proposed model can improve performance greatly compared with the baseline systems.


 DOI: 10.21437/Interspeech.2020-2723

Cite as: Lei, Z., Yang, Y., Liu, C., Ye, J. (2020) Siamese Convolutional Neural Network Using Gaussian Probability Feature for Spoofing Speech Detection. Proc. Interspeech 2020, 1116-1120, DOI: 10.21437/Interspeech.2020-2723.


@inproceedings{Lei2020,
  author={Zhenchun Lei and Yingen Yang and Changhong Liu and Jihua Ye},
  title={{Siamese Convolutional Neural Network Using Gaussian Probability Feature for Spoofing Speech Detection}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1116--1120},
  doi={10.21437/Interspeech.2020-2723},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2723}
}