An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis

Yang Cui, Xi Wang, Lei He, Frank K. Soong

LPCNet neural vocoder and its variants have shown the ability to synthesize high-quality speech in small footprint by exploiting domain knowledge in speech. In this paper, we introduce subband linear prediction in LPCNet for producing high fidelity speech more efficiently with consideration of subband correlation. Speech is decomposed into multiple subband signals with linear prediction to reduce the complexity of neural vocoder. A novel subband-based autoregressive model is proposed to learn the joint distribution of the subband sequences by introducing a reasonable assumption, which keeps the dependence between subbands while accelerating the inference speed. Based upon the human auditory perception sensitivity to the harmonic speech components in the baseband, we allocate more computational resources to model the low-frequency subband to synthesize natural phase and magnitude of the synthesized speech. Both objective and subjective tests show the proposed subband LPCNet neural vocoder can synthesize higher quality speech than the original fullband one (MOS 4.62 vs. 4.54), at a rate nearly three times faster.

 DOI: 10.21437/Interspeech.2020-1463

Cite as: Cui, Y., Wang, X., He, L., Soong, F.K. (2020) An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis. Proc. Interspeech 2020, 3555-3559, DOI: 10.21437/Interspeech.2020-1463.

  author={Yang Cui and Xi Wang and Lei He and Frank K. Soong},
  title={{An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis}},
  booktitle={Proc. Interspeech 2020},