Exploring the Encoder Layers of Discriminative Autoencoders for LVCSR

Pin-Tuan Huang, Hung-Shin Lee, Syu-Siang Wang, Kuan-Yu Chen, Yu Tsao, Hsin-Min Wang

Discriminative autoencoders (DcAEs) have been proven to improve generalization of the learned acoustic models by increasing their reconstruction capacity of input features from the frame embeddings. In this paper, we integrate DcAEs into two models, namely TDNNs and LSTMs, which have been commonly adopted in the Kaldi recipes for LVCSR in recent years, using the modified nnet3 neural network library. We also explore two kinds of skip-connection mechanisms for DcAEs, namely concatenation and addition. The results of LVCSR experiments on the MATBN Mandarin Chinese corpus and the WSJ English corpus show that the proposed DcAE-TDNN-based system achieves relative word error rate reductions of 3% and 10% over the TDNN-based baseline system, respectively. The DcAE-TDNN-LSTM-based system also outperforms the TDNN-LSTM-based baseline system. The results imply the flexibility of DcAEs to be integrated with other existing or prospective neural network-based acoustic models.

 DOI: 10.21437/Interspeech.2019-1717

Cite as: Huang, P., Lee, H., Wang, S., Chen, K., Tsao, Y., Wang, H. (2019) Exploring the Encoder Layers of Discriminative Autoencoders for LVCSR. Proc. Interspeech 2019, 1631-1635, DOI: 10.21437/Interspeech.2019-1717.

  author={Pin-Tuan Huang and Hung-Shin Lee and Syu-Siang Wang and Kuan-Yu Chen and Yu Tsao and Hsin-Min Wang},
  title={{Exploring the Encoder Layers of Discriminative Autoencoders for LVCSR}},
  booktitle={Proc. Interspeech 2019},