Unbiased Semi-Supervised LF-MMI Training Using Dropout

Sibo Tong, Apoorv Vyas, Philip N. Garner, Hervé Bourlard

The lattice-free MMI objective (LF-MMI) with finite-state transducer (FST) supervision lattice has been used in semi-supervised training of state-of-the-art neural network acoustic models for automatic speech recognition (ASR). However, the FST based supervision lattice does not sample from the posterior predictive distribution of word-sequences but only contains the decoding hypotheses corresponding to the Maximum Likelihood estimate of weights, so that the training might be biased towards incorrect hypotheses in the supervision lattice even if the best path is perfectly correct. In this paper, we propose a novel framework which uses Dropout at the test time to sample from the posterior predictive distribution of word-sequences to produce unbiased supervision lattices for semi-supervised training. We investigate the dropout sampling from both the acoustic model and the language model to generate supervision. Results on Fisher English show that the proposed approach achieves WER recovery of ~51.6% over regular semi-supervised LF-MMI training.

 DOI: 10.21437/Interspeech.2019-2678

Cite as: Tong, S., Vyas, A., Garner, P.N., Bourlard, H. (2019) Unbiased Semi-Supervised LF-MMI Training Using Dropout. Proc. Interspeech 2019, 1576-1580, DOI: 10.21437/Interspeech.2019-2678.

  author={Sibo Tong and Apoorv Vyas and Philip N. Garner and Hervé Bourlard},
  title={{Unbiased Semi-Supervised LF-MMI Training Using Dropout}},
  booktitle={Proc. Interspeech 2019},