Compression of End-to-End Models

Ruoming Pang, Tara Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang, Chung-Cheng Chiu

End-to-end models, which output text directly given speech using a single neural network, have been shown to be competitive with conventional speech recognition models containing separate acoustic, pronunciation and language model components. Such models do not require additional resources for decoding and are typically much smaller than conventional models. This makes them particularly attractive in the context of on-device speech recognition where both small memory footprint and low power consumption are critical. This work explores the problem of compressing end-to-end models with the goal of satisfying device constraints without sacrificing model accuracy. We evaluate matrix factorization, knowledge distillation and parameter sparsity to determine the most effective methods given constraints such as a fixed parameter budget.

 DOI: 10.21437/Interspeech.2018-1025

Cite as: Pang, R., Sainath, T., Prabhavalkar, R., Gupta, S., Wu, Y., Zhang, S., Chiu, C. (2018) Compression of End-to-End Models. Proc. Interspeech 2018, 27-31, DOI: 10.21437/Interspeech.2018-1025.

  author={Ruoming Pang and Tara Sainath and Rohit Prabhavalkar and Suyog Gupta and Yonghui Wu and Shuyuan Zhang and Chung-Cheng Chiu},
  title={Compression of End-to-End Models},
  booktitle={Proc. Interspeech 2018},