Polishing the Classical Likelihood Ratio Test by Supervised Learning for Voice Activity Detection

Tianjiao Xu, Hui Zhang, Xueliang Zhang


Voice activity detection (VAD) is essential for speech signal processing system, which desires low computational cost and high real-time processing. Likelihood ratio test (LRT) based VAD is a widely used and effective approach in many applications. However, it is still a challenge in low signal-to-noise ratio (SNR) and non-stationary noisy scenario. To cope with this challenge, we propose a supervised masking-based parameter estimation module with an adaptive threshold to improve the performance of a state-of-the-art LRT based VAD. Moreover, considering real-time processing, we compared the proposed with corresponding end-to-end supervised learning approaches in various model sizes. Experimental results show that the proposed method leads to consistently better performance than both of the existing LRT based method and end-to-end supervised learning based approaches.


 DOI: 10.21437/Interspeech.2020-1177

Cite as: Xu, T., Zhang, H., Zhang, X. (2020) Polishing the Classical Likelihood Ratio Test by Supervised Learning for Voice Activity Detection. Proc. Interspeech 2020, 3675-3679, DOI: 10.21437/Interspeech.2020-1177.


@inproceedings{Xu2020,
  author={Tianjiao Xu and Hui Zhang and Xueliang Zhang},
  title={{Polishing the Classical Likelihood Ratio Test by Supervised Learning for Voice Activity Detection}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3675--3679},
  doi={10.21437/Interspeech.2020-1177},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1177}
}