Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts

Yizhou Lu, Mingkun Huang, Hao Li, Jiaqi Guo, Yanmin Qian


Code-switching speech recognition is a challenging task which has been studied in many previous work, and one main challenge for this task is the lack of code-switching data. In this paper, we study end-to-end models for Mandarin-English code-switching automatic speech recognition. External monolingual data are utilized to alleviate the data sparsity problem. More importantly, we propose a bi-encoder transformer network based Mixture of Experts (MoE) architecture to better leverage these data. We decouple Mandarin and English modeling with two separate encoders to better capture language-specific information, and a gating network is employed to explicitly handle the language identification task. For the gating network, different models and training modes are explored to learn the better MoE interpolation coefficients. Experimental results show that compared with the baseline transformer model, the proposed new MoE architecture can obtain up to 10.4% relative error reduction on the code-switching test set.


 DOI: 10.21437/Interspeech.2020-2485

Cite as: Lu, Y., Huang, M., Li, H., Guo, J., Qian, Y. (2020) Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts. Proc. Interspeech 2020, 4766-4770, DOI: 10.21437/Interspeech.2020-2485.


@inproceedings{Lu2020,
  author={Yizhou Lu and Mingkun Huang and Hao Li and Jiaqi Guo and Yanmin Qian},
  title={{Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4766--4770},
  doi={10.21437/Interspeech.2020-2485},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2485}
}