Frame-Wise Online Unsupervised Adaptation of DNN-HMM Acoustic Model from Perspective of Robust Adaptive Filtering

Ryu Takeda, Kazunori Komatani


We present a new frame-wise online unsupervised adaptation method for an acoustic model based on a deep neural network (DNN). This is in contrast to many existing methods that assume offline and supervised processing. We use a likelihood cost function conditioned by past observations, which mathematically integrate the unsupervised adaptation and decoding process for automatic speech recognition (ASR). The issue is that the parameter update of the DNN should be less affected by outliers (model mismatch) and ASR recognition errors. Inspired by the robust adaptive filter techniques, we propose 1) parameter update control to remove the influence of the outliers and 2) regularization using L2-norm of DNN’s posterior probabilities of specific phonemes that are prone to recognition errors. Experiments showed that the phoneme recognition accuracies were improved by a maximum of 6.3 points, with an average error reduction rate of 10%, for various speakers.


 DOI: 10.21437/Interspeech.2020-1301

Cite as: Takeda, R., Komatani, K. (2020) Frame-Wise Online Unsupervised Adaptation of DNN-HMM Acoustic Model from Perspective of Robust Adaptive Filtering. Proc. Interspeech 2020, 1291-1295, DOI: 10.21437/Interspeech.2020-1301.


@inproceedings{Takeda2020,
  author={Ryu Takeda and Kazunori Komatani},
  title={{Frame-Wise Online Unsupervised Adaptation of DNN-HMM Acoustic Model from Perspective of Robust Adaptive Filtering}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1291--1295},
  doi={10.21437/Interspeech.2020-1301},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1301}
}