Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning

Hao Li, DeLiang Wang, Xueliang Zhang, Guanglai Gao


This study investigates deep learning based signal-to-noise ratio (SNR) estimation at the frame level. We propose to employ recurrent neural networks (RNNs) with long short-term memory (LSTM) in order to leverage contextual information for this task. As acoustic features are important for deep learning algorithms, we also examine a variety of monaural features and investigate feature combinations using Group Lasso and sequential floating forward selection. By replacing LSTM with bidirectional LSTM, the proposed algorithm naturally leads to a long-term SNR estimator. Systematical evaluations demonstrate that the proposed SNR estimators significantly outperform other frame-level and long-term SNR estimators.


 DOI: 10.21437/Interspeech.2020-2475

Cite as: Li, H., Wang, D., Zhang, X., Gao, G. (2020) Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning. Proc. Interspeech 2020, 4626-4630, DOI: 10.21437/Interspeech.2020-2475.


@inproceedings{Li2020,
  author={Hao Li and DeLiang Wang and Xueliang Zhang and Guanglai Gao},
  title={{Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4626--4630},
  doi={10.21437/Interspeech.2020-2475},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2475}
}