Fast and Lightweight On-Device TTS with Tacotron2 and LPCNet

Vadim Popov, Stanislav Kamenev, Mikhail Kudinov, Sergey Repyevsky, Tasnima Sadekova, Vitalii Bushaev, Vladimir Kryzhanovskiy, Denis Parkhomenko


We present a fast and lightweight on-device text-to-speech system based on state-of-art methods of feature and speech generation i.e. Tacotron2 and LPCNet. We show that modification of the basic pipeline combined with hardware-specific optimizations and extensive usage of parallelization enables running TTS service even on low-end devices with faster than realtime waveform generation. Moreover, the system preserves high quality of speech without noticeable degradation of Mean Opinion Score compared to the non-optimized baseline. While the system is mostly oriented on low-to-mid range hardware we believe that it can also be used in any CPU-based environment.


 DOI: 10.21437/Interspeech.2020-2169

Cite as: Popov, V., Kamenev, S., Kudinov, M., Repyevsky, S., Sadekova, T., Bushaev, V., Kryzhanovskiy, V., Parkhomenko, D. (2020) Fast and Lightweight On-Device TTS with Tacotron2 and LPCNet. Proc. Interspeech 2020, 220-224, DOI: 10.21437/Interspeech.2020-2169.


@inproceedings{Popov2020,
  author={Vadim Popov and Stanislav Kamenev and Mikhail Kudinov and Sergey Repyevsky and Tasnima Sadekova and Vitalii Bushaev and Vladimir Kryzhanovskiy and Denis Parkhomenko},
  title={{Fast and Lightweight On-Device TTS with Tacotron2 and LPCNet}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={220--224},
  doi={10.21437/Interspeech.2020-2169},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2169}
}