Continual Learning for Multi-Dialect Acoustic Models

Brady Houston, Katrin Kirchhoff

Using data from multiple dialects has shown promise in improving neural network acoustic models. While such training can improve the performance of an acoustic model on a single dialect, it can also produce a model capable of good performance on multiple dialects. However, training an acoustic model on pooled data from multiple dialects takes a significant amount of time and computing resources, and it needs to be retrained every time a new dialect is added to the model. In contrast, sequential transfer learning (fine-tuning) does not require retraining using all data, but may result in catastrophic forgetting of previously-seen dialects. Using data from four english dialects, we demonstrate that by using loss functions that mitigate catastrophic forgetting, sequential transfer learning can be used to train multi-dialect acoustic models that narrow the WER gap between the best (combined training) and worst (fine-tuning) case by up to 65%. Continual learning shows great promise in minimizing training time while approaching the performance of models that require much more training time.

 DOI: 10.21437/Interspeech.2020-1797

Cite as: Houston, B., Kirchhoff, K. (2020) Continual Learning for Multi-Dialect Acoustic Models. Proc. Interspeech 2020, 576-580, DOI: 10.21437/Interspeech.2020-1797.

  author={Brady Houston and Katrin Kirchhoff},
  title={{Continual Learning for Multi-Dialect Acoustic Models}},
  booktitle={Proc. Interspeech 2020},