Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity

Deepak Kadetotad, Jian Meng, Visar Berisha, Chaitali Chakrabarti, Jae-sun Seo

The long short-term memory (LSTM) network is one of the most widely used recurrent neural networks (RNNs) for automatic speech recognition (ASR), but is parametrized by millions of parameters. This makes it prohibitive for memory-constrained hardware accelerators as the storage demand causes higher dependence on off-chip memory, which bottlenecks latency and power. In this paper, we propose a new LSTM training technique based on hierarchical coarse-grain sparsity (HCGS), which enforces hierarchical structured sparsity by randomly dropping static block-wise connections between layers. HCGS maintains the same hierarchical structured sparsity throughout training and inference; this reduces weight storage for both training and inference hardware systems. We also jointly optimize in-training quantization with HCGS on 2-/3-layer LSTM networks for the TIMIT and TED-LIUM corpora. With 16× structured compression and 6-bit weight precision, we achieved a phoneme error rate (PER) of 16.9% for TIMIT and a word error rate (WER) of 18.9% for TED-LIUM, showing the best trade-off between error rate and LSTM memory compression compared to prior works.

 DOI: 10.21437/Interspeech.2020-1270

Cite as: Kadetotad, D., Meng, J., Berisha, V., Chakrabarti, C., Seo, J. (2020) Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity. Proc. Interspeech 2020, 21-25, DOI: 10.21437/Interspeech.2020-1270.

  author={Deepak Kadetotad and Jian Meng and Visar Berisha and Chaitali Chakrabarti and Jae-sun Seo},
  title={{Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity}},
  booktitle={Proc. Interspeech 2020},