Unsupervised Regularization-Based Adaptive Training for Speech Recognition

Fenglin Ding, Wu Guo, Bin Gu, Zhen-Hua Ling, Jun Du

In this paper, we propose two novel regularization-based speaker adaptive training approaches for connectionist temporal classification (CTC) based speech recognition. The first method is center loss (CL) regularization, which is used to penalize the distances between the embeddings of different speakers and the only center. The second method is speaker variance loss (SVL) regularization in which we directly minimize the speaker interclass variance during model training. Both methods achieve the purpose of training an adaptive model on the fly by adding regularization terms to the training loss function. Our experiment on the AISHELL-1 Mandarin recognition task shows that both methods are effective at adapting the CTC model without requiring any specific fine-tuning or additional complexity, achieving character error rate improvements of up to 8.1% and 8.6% over the speaker independent (SI) model, respectively.

 DOI: 10.21437/Interspeech.2020-1689

Cite as: Ding, F., Guo, W., Gu, B., Ling, Z., Du, J. (2020) Unsupervised Regularization-Based Adaptive Training for Speech Recognition. Proc. Interspeech 2020, 996-1000, DOI: 10.21437/Interspeech.2020-1689.

  author={Fenglin Ding and Wu Guo and Bin Gu and Zhen-Hua Ling and Jun Du},
  title={{Unsupervised Regularization-Based Adaptive Training for Speech Recognition}},
  booktitle={Proc. Interspeech 2020},