Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments

Dung N. Tran, Uros Batricevic, Kazuhito Koishida


Accurate voiced/unvoiced information is crucial in estimating the pitch of a target speech signal in severe nonstationary noise environments. Nevertheless, state-of-the-art pitch estimators based on deep neural networks (DNN) lack a dedicated mechanism for robustly detecting voiced and unvoiced segments in the target speech in noisy conditions. In this work, we proposed an end-to-end deep learning-based pitch estimation framework which jointly detects voiced/unvoiced segments and predicts pitch values for the voiced regions of the ground-truth speech. We empirically showed that our proposed framework significantly more robust than state-of-the-art DNN based pitch detectors in nonstationary noise settings. Our results suggest that joint training of voiced/unvoiced detection and voiced pitch prediction can significantly improve pitch estimation performance.


 DOI: 10.21437/Interspeech.2020-3019

Cite as: Tran, D.N., Batricevic, U., Koishida, K. (2020) Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments. Proc. Interspeech 2020, 175-179, DOI: 10.21437/Interspeech.2020-3019.


@inproceedings{Tran2020,
  author={Dung N. Tran and Uros Batricevic and Kazuhito Koishida},
  title={{Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={175--179},
  doi={10.21437/Interspeech.2020-3019},
  url={http://dx.doi.org/10.21437/Interspeech.2020-3019}
}