Automatic DNN Node Pruning Using Mixture Distribution-based Group Regularization

Tsukasa Yoshida, Takafumi Moriya, Kazuho Watanabe, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono

In this paper, we address a constrained training for deep neural network-based acoustic model size reduction. While the L2 regularizer is used as a modeling approach to shrinking parameters, we cannot cut down the unimportant parts because it does not assume any group structure. The Group Lasso regularizer is used for the model size reduction approach. Group Lasso can set arbitrary group parameters (e.g. the column vector norms of the parameter matrices) as unimportant parts and make the parameters sparse. Therefore, we can prune the unimportant parameters whose group parameter norm is nearly zero. However, Group Lasso does not suggest a clear rule for separating parameters close to zero and large in the group parameter space and hence is unsuitable for the model size reduction. To solve these problems, we propose a mixture distribution-based regularizer which assumes distributions of norms in the group parameter space. We evaluate our method on a NTT real recorded voice search data containing 1600 hours. Our proposal achieves 27.0% reduction compared to the pruned model by Group Lasso while keeping recognition performance.

 DOI: 10.21437/Interspeech.2018-2062

Cite as: Yoshida, T., Moriya, T., Watanabe, K., Shinohara, Y., Yamaguchi, Y., Aono, Y. (2018) Automatic DNN Node Pruning Using Mixture Distribution-based Group Regularization. Proc. Interspeech 2018, 1269-1273, DOI: 10.21437/Interspeech.2018-2062.

  author={Tsukasa Yoshida and Takafumi Moriya and Kazuho Watanabe and Yusuke Shinohara and Yoshikazu Yamaguchi and Yushi Aono},
  title={Automatic DNN Node Pruning Using Mixture Distribution-based Group Regularization},
  booktitle={Proc. Interspeech 2018},