Compressing End-to-end ASR Networks by Tensor-Train Decomposition

Takuma Mori, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

End-to-end deep learning has become a popular framework for automatic speech recognition (ASR) tasks and it has proven itself to be a powerful solution. Unfortunately, network structures commonly have millions of parameters and large computational resources are required to make this approach feasible for training and running such networks. Moreover, many applications still prefer lightweight models of ASR that can run efficiently on mobile or wearable devices. To address this challenge, we propose an approach that can reduce the number of ASR parameters. Specifically, we perform Tensor-Train decomposition on the weight matrix of the gated recurrent unit (TT-GRU) in the end-to-end ASR framework. Experimental results on LibriSpeech data reveal that the compressed ASR with TT-GRU can maintain good performance while greatly reducing the number of parameters.

 DOI: 10.21437/Interspeech.2018-1543

Cite as: Mori, T., Tjandra, A., Sakti, S., Nakamura, S. (2018) Compressing End-to-end ASR Networks by Tensor-Train Decomposition. Proc. Interspeech 2018, 806-810, DOI: 10.21437/Interspeech.2018-1543.

  author={Takuma Mori and Andros Tjandra and Sakriani Sakti and Satoshi Nakamura},
  title={Compressing End-to-end ASR Networks by Tensor-Train Decomposition},
  booktitle={Proc. Interspeech 2018},