FeatherWave: An Efficient High-Fidelity Neural Vocoder with Multi-Band Linear Prediction

Qiao Tian, Zewang Zhang, Heng Lu, Ling-Hui Chen, Shan Liu


In this paper, we propose the FeatherWave, yet another variant of WaveRNN vocoder combining the multi-band signal processing and the linear predictive coding. The LPCNet, a recently proposed neural vocoder which utilized the linear predictive characteristic of speech signal in the WaveRNN architecture, can generate high quality speech with a speed faster than real-time on a single CPU core. However, LPCNet is still not efficient enough for online speech generation tasks. To address this issue, we adopt the multi-band linear predictive coding for WaveRNN vocoder. The multi-band method enables the model to generate several speech samples in parallel at one step. Therefore, it can significantly improve the efficiency of speech synthesis. The proposed model with 4 sub-bands needs less than 1.6 GFLOPS for speech generation. In our experiments, it can generate 24 kHz high-fidelity audio 9× faster than real-time on a single CPU, which is much faster than the LPCNet vocoder. Furthermore, our subjective listening test shows that the FeatherWave can generate speech with better quality than LPCNet.


 DOI: 10.21437/Interspeech.2020-1156

Cite as: Tian, Q., Zhang, Z., Lu, H., Chen, L., Liu, S. (2020) FeatherWave: An Efficient High-Fidelity Neural Vocoder with Multi-Band Linear Prediction. Proc. Interspeech 2020, 195-199, DOI: 10.21437/Interspeech.2020-1156.


@inproceedings{Tian2020,
  author={Qiao Tian and Zewang Zhang and Heng Lu and Ling-Hui Chen and Shan Liu},
  title={{FeatherWave: An Efficient High-Fidelity Neural Vocoder with Multi-Band Linear Prediction}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={195--199},
  doi={10.21437/Interspeech.2020-1156},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1156}
}