Speaker Code Based Speaker Adaptive Training Using Model Agnostic Meta-Learning

Huaxin Wu, Genshun Wan, Jia Pan


The performance of automatic speech recognition systems can be improved by speaker adaptive training (SAT), which adapts an acoustic model to compensate for the mismatch between training and testing conditions. Speaker code learning is one of the useful ways for speaker adaptive training. It learns a set of speaker dependent codes together with speaker independent acoustic model in order to remove speaker variation. Conventionally, speaker dependent codes and speaker independent acoustic model are jointly optimized. However, this could make it difficult to decouple the speaker code from the acoustic model. In this paper, we take the speaker code based SAT as a meta-learning task. The acoustic model is considered as meta-knowledge, while speaker code is considered as task specific knowledge. Experiments on the Switchboard task show that our method can not only learn a good speaker code, but also improve the performance of the acoustic model even without speaker code.


 DOI: 10.21437/Interspeech.2020-2296

Cite as: Wu, H., Wan, G., Pan, J. (2020) Speaker Code Based Speaker Adaptive Training Using Model Agnostic Meta-Learning. Proc. Interspeech 2020, 4362-4366, DOI: 10.21437/Interspeech.2020-2296.


@inproceedings{Wu2020,
  author={Huaxin Wu and Genshun Wan and Jia Pan},
  title={{Speaker Code Based Speaker Adaptive Training Using Model Agnostic Meta-Learning}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4362--4366},
  doi={10.21437/Interspeech.2020-2296},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2296}
}