Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR

Purvi Agrawal, Sriram Ganapathy


The performance of an automatic speech recognition (ASR) system degrades severely in noisy and reverberant environments in part due to the lack of robustness in the underlying representations used in the ASR system. On the other hand, the auditory processing studies have shown the importance of modulation filtered spectrogram representations in robust human speech recognition. Inspired by these evidences, we propose a speech representation learning paradigm using data-driven 2-D spectro-temporal modulation filter learning. In particular, multiple representations are derived using the convolutional restricted Boltzmann machine (CRBM) model in an unsupervised manner from the input speech spectrogram. A filter selection criteria based on average number of active hidden units is also employed to select the representations for ASR. The experiments are performed on Wall Street Journal (WSJ) Aurora-4 database with clean and multi condition training setup. In these experiments, the ASR results obtained from the proposed modulation filtering approach shows significant robustness to noise and channel distortions compared to other feature extraction methods (average relative improvements of 19% over baseline features in clean training). Furthermore, the ASR experiments performed on reverberant speech data from the REVERB challenge corpus highlight the benefits of the proposed representation learning scheme for far field speech recognition.


 DOI: 10.21437/Interspeech.2017-901

Cite as: Agrawal, P., Ganapathy, S. (2017) Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR. Proc. Interspeech 2017, 2446-2450, DOI: 10.21437/Interspeech.2017-901.


@inproceedings{Agrawal2017,
  author={Purvi Agrawal and Sriram Ganapathy},
  title={Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2446--2450},
  doi={10.21437/Interspeech.2017-901},
  url={http://dx.doi.org/10.21437/Interspeech.2017-901}
}