Bandpass Noise Generation and Augmentation for Unified ASR

Kshitiz Kumar, Bo Ren, Yifan Gong, Jian Wu

Data Simulation is a crucial technique for robust automatic speech recognition (ASR) systems. We develop this work in the scope of data augmentation and improve robustness by generating new bandpass noise resources from an existing noise corpus. We design numerous bandpass filters with varying center frequencies and filter bandwidths, and obtain corresponding bandpass noise samples. We augment our baseline data simulation with bandpass noises to ingest additional robustness and generalization to generic and unknown acoustic scenarios. This work targets ASR robustness to individual subband noises, and improves robustness to unseen real-world noise that can be approximated as a factorial combination of subband noises. We demonstrate our work for a large scale unified ASR task. We obtained 7% word error rate relative reduction (WERR) across unseen acoustic conditions and 11% WERR for kids speech. We also demonstrate generalization to new ASR applications.

 DOI: 10.21437/Interspeech.2020-2904

Cite as: Kumar, K., Ren, B., Gong, Y., Wu, J. (2020) Bandpass Noise Generation and Augmentation for Unified ASR. Proc. Interspeech 2020, 1683-1687, DOI: 10.21437/Interspeech.2020-2904.

  author={Kshitiz Kumar and Bo Ren and Yifan Gong and Jian Wu},
  title={{Bandpass Noise Generation and Augmentation for Unified ASR}},
  booktitle={Proc. Interspeech 2020},