Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network

Nana Hou, Chenglin Xu, Van Tung Pham, Joey Tianyi Zhou, Eng Siong Chng, Haizhou Li


Speech bandwidth extension aims to generate a wideband signal from a narrowband (low-band) input by predicting the missing high-frequency components. It is believed that the general knowledge about the speaker and phonetic content strengthens the prediction. In this paper, we propose to augment the low-band acoustic features with i-vector and phonetic posteriorgram (PPG), which represent speaker and phonetic content of the speech, respectively. We also propose a residual dual-path network (RDPN) as the core module to process the augmented features, which fully utilizes the utterance-level temporal continuity information and avoids gradient vanishing. Experiments show that the proposed method achieves 20.2% and 7.0% relative improvements over the best baseline in terms of log-spectral distortion (LSD) and signal-to-noise ratio (SNR), respectively. Furthermore, our method is 16 times more compact than the best baseline in terms of the number of parameters.


 DOI: 10.21437/Interspeech.2020-1994

Cite as: Hou, N., Xu, C., Pham, V.T., Zhou, J.T., Chng, E.S., Li, H. (2020) Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network. Proc. Interspeech 2020, 4064-4068, DOI: 10.21437/Interspeech.2020-1994.


@inproceedings{Hou2020,
  author={Nana Hou and Chenglin Xu and Van Tung Pham and Joey Tianyi Zhou and Eng Siong Chng and Haizhou Li},
  title={{Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4064--4068},
  doi={10.21437/Interspeech.2020-1994},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1994}
}