SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR

Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang, Bhuvana Ramabhadran, Pedro J. Moreno


Recent developments in data augmentation has brought great gains in improvement for automatic speech recognition (ASR). Parallel developments in augmentation policy search in computer vision domain has shown improvements in model performance and robustness. In addition, recent developments in semi-supervised learning has shown that consistency measures are crucial for performance and robustness. In this work, we demonstrate that combining augmentation policies with consistency measures and model regularization can greatly improve speech recognition performance. Using the Librispeech task, we show: 1) symmetric consistency measures such as the Jensen-Shannon Divergence provide 4% relative improvements in ASR performance; 2) Augmented adversarial inputs using Virtual Adversarial Noise (VAT) provides 12% relative win; and 3) random sampling from arbitrary combination of augmentation policies yields the best policy. These contributions result in an overall reduction in Word Error Rate (WER) of 15% relative on the Librispeech task presented in this paper.


 DOI: 10.21437/Interspeech.2020-2920

Cite as: Wang, G., Rosenberg, A., Chen, Z., Zhang, Y., Ramabhadran, B., Moreno, P.J. (2020) SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR. Proc. Interspeech 2020, 2832-2836, DOI: 10.21437/Interspeech.2020-2920.


@inproceedings{Wang2020,
  author={Gary Wang and Andrew Rosenberg and Zhehuai Chen and Yu Zhang and Bhuvana Ramabhadran and Pedro J. Moreno},
  title={{SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2832--2836},
  doi={10.21437/Interspeech.2020-2920},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2920}
}