Bunched LPCNet: Vocoder for Low-Cost Neural Text-To-Speech Systems

Ravichander Vipperla, Sangjun Park, Kihyun Choo, Samin Ishtiaq, Kyoungbo Min, Sourav Bhattacharya, Abhinav Mehrotra, Alberto Gil C.P. Ramos, Nicholas D. Lane

LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. In this work, we present two techniques to further reduce it’s complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These techniques are: 1) Sample-bunching, which allows LPCNet to generate more than one audio sample per inference; and 2) Bit-bunching, which reduces the computations in the final layer of LPCNet. With the proposed bunching techniques, LPCNet, in conjunction with a Deep Convolutional TTS (DCTTS) acoustic model, shows a 2.19× improvement over the baseline run-time when running on a mobile device, with a less than 0.1 decrease in TTS mean opinion score (MOS).

 DOI: 10.21437/Interspeech.2020-2041

Cite as: Vipperla, R., Park, S., Choo, K., Ishtiaq, S., Min, K., Bhattacharya, S., Mehrotra, A., Ramos, A.G.C., Lane, N.D. (2020) Bunched LPCNet: Vocoder for Low-Cost Neural Text-To-Speech Systems. Proc. Interspeech 2020, 3565-3569, DOI: 10.21437/Interspeech.2020-2041.

  author={Ravichander Vipperla and Sangjun Park and Kihyun Choo and Samin Ishtiaq and Kyoungbo Min and Sourav Bhattacharya and Abhinav Mehrotra and Alberto Gil C.P. Ramos and Nicholas D. Lane},
  title={{Bunched LPCNet: Vocoder for Low-Cost Neural Text-To-Speech Systems}},
  booktitle={Proc. Interspeech 2020},