Anti-Aliasing Regularization in Stacking Layers

Antoine Bruguier, Ananya Misra, Arun Narayanan, Rohit Prabhavalkar

Shift-invariance is a desirable property of many machine learning models. It means that delaying the input of a model in time should only result in delaying its prediction in time. A model that is shift-invariant, also eliminates undesirable side effects like frequency aliasing. When building sequence models, not only should the shift-invariance property be preserved when sampling input features, it must also be respected inside the model itself. Here, we study the impact of the commonly used stacking layer in LSTM-based ASR models and show that aliasing is likely to occur. Experimentally, by adding merely 7 parameters to an existing speech recognition model that has 120 million parameters, we are able to reduce the impact of aliasing. This acts as a regularizer that discards frequencies the model shouldn’t be relying on for predictions. Our results show that under conditions unseen at training, we are able to reduce the relative word error rate by up to 5%.

 DOI: 10.21437/Interspeech.2020-1497

Cite as: Bruguier, A., Misra, A., Narayanan, A., Prabhavalkar, R. (2020) Anti-Aliasing Regularization in Stacking Layers. Proc. Interspeech 2020, 314-318, DOI: 10.21437/Interspeech.2020-1497.

  author={Antoine Bruguier and Ananya Misra and Arun Narayanan and Rohit Prabhavalkar},
  title={{Anti-Aliasing Regularization in Stacking Layers}},
  booktitle={Proc. Interspeech 2020},