Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning

Hao Li, DeLiang Wang, Xueliang Zhang, Guanglai Gao

This study investigates deep learning based signal-to-noise ratio (SNR) estimation at the frame level. We propose to employ recurrent neural networks (RNNs) with long short-term memory (LSTM) in order to leverage contextual information for this task. As acoustic features are important for deep learning algorithms, we also examine a variety of monaural features and investigate feature combinations using Group Lasso and sequential floating forward selection. By replacing LSTM with bidirectional LSTM, the proposed algorithm naturally leads to a long-term SNR estimator. Systematical evaluations demonstrate that the proposed SNR estimators significantly outperform other frame-level and long-term SNR estimators.

 DOI: 10.21437/Interspeech.2020-2475

Cite as: Li, H., Wang, D., Zhang, X., Gao, G. (2020) Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning. Proc. Interspeech 2020, 4626-4630, DOI: 10.21437/Interspeech.2020-2475.

  author={Hao Li and DeLiang Wang and Xueliang Zhang and Guanglai Gao},
  title={{Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning}},
  booktitle={Proc. Interspeech 2020},