Far-Field Speech Recognition Using Multivariate Autoregressive Models

Sriram Ganapathy, Madhumita Harish

Automatic speech recognition in far-field reverberant environments is challenging even with the state-of-the-art recognition systems. The main issues are artifacts in the signal due to the long-term reverberation that results in temporal smearing. The autoregressive modeling approach to speech feature extraction involves representing the high energy regions of the signal which are less susceptible to noise. In this paper, we propose a novel method of speech feature extraction using multivariate AR modeling (MAR) of temporal envelopes. The sub-band discrete cosine transform coefficients obtained from multiple speech bands are used in a multivariate linear prediction setting to derive features for speech recognition. For single channel far-field speech recognition, the features are derived using multi-band linear prediction. In the case of multi-channel far-field speech recognition, we use the multi-channel data in the MAR framework. We perform several speech recognition experiments in the REVERB Challenge database for single and multi-microphone settings. In these experiments, the proposed feature extraction method provides significant improvements over baseline methods (average relative improvements of 9.7% and 3.9% in single microphone conditions for clean and multi-conditions respectively and 6.3% in multi-microphone conditions). The results with clean training on single microphone conditions further illustrates the effectiveness of the MAR features.

 DOI: 10.21437/Interspeech.2018-2003

Cite as: Ganapathy, S., Harish, M. (2018) Far-Field Speech Recognition Using Multivariate Autoregressive Models. Proc. Interspeech 2018, 3023-3027, DOI: 10.21437/Interspeech.2018-2003.

  author={Sriram Ganapathy and Madhumita Harish},
  title={Far-Field Speech Recognition Using Multivariate Autoregressive Models},
  booktitle={Proc. Interspeech 2018},