Active Learning for LF-MMI Trained Neural Networks in ASR

Yanhua Long, Hong Ye, Yijie Li, Jiaen Liang

This paper investigates how active learning (AL) effects the training of neural network acoustic models based on Lattice-free Maximum Mutual Information (LF-MMI) in automatic speech recognition (ASR). To fully exploit the most informative examples from fresh datasets, different data selection criterions based on the heterogeneous neural networks were studied. In particular, we examined the relationship among the transcription cost of human labeling, example informativeness and data selection criterions for active learning. As a comparison, we tried both semi-supervised training (SST) and active learning to improve the acoustic models. Experiments were performed for both the small-scale and large-scale ASR systems. Experimental results suggested that, our AL scheme can benefit much more from the fresh data than the SST in reducing the word error rate (WER).The AL yields 6~13% relative WER reduction against the baseline trained on a 4000 hours transcribed dataset, by only selecting 1.2K hrs informative utterances for human labeling via active learning.

 DOI: 10.21437/Interspeech.2018-1162

Cite as: Long, Y., Ye, H., Li, Y., Liang, J. (2018) Active Learning for LF-MMI Trained Neural Networks in ASR. Proc. Interspeech 2018, 2898-2902, DOI: 10.21437/Interspeech.2018-1162.

  author={Yanhua Long and Hong Ye and Yijie Li and Jiaen Liang},
  title={Active Learning for LF-MMI Trained Neural Networks in ASR},
  booktitle={Proc. Interspeech 2018},