Towards Language-Universal Mandarin-English Speech Recognition

Shiliang Zhang, Yuan Liu, Ming Lei, Bin Ma, Lei Xie

Multilingual and code-switching speech recognition are two challenging tasks that are studied separately in many previous works. In this work, we jointly study multilingual and code-switching problems, and present a language-universal bilingual system for Mandarin-English speech recognition. Specifically, we propose a novel bilingual acoustic model, which consists of two monolingual system initialized subnets and a shared output layer corresponding to the Character-Subword acoustic modeling units. The bilingual acoustic model is trained using a large Mandarin-English corpus with CTC and sMBR criteria. We find that this model, which is not given any information about language identity, can achieve comparable performance in monolingual Mandarin and English test sets compared to the well-trained language-specific Mandarin and English ASR systems, respectively. More importantly, the proposed bilingual model can automatically learn the language switching. Experimental results on a Mandarin-English code-switching test set show that it can achieve 11.8% and 17.9% relative error reduction on Mandarin and English parts, respectively.

 DOI: 10.21437/Interspeech.2019-1365

Cite as: Zhang, S., Liu, Y., Lei, M., Ma, B., Xie, L. (2019) Towards Language-Universal Mandarin-English Speech Recognition. Proc. Interspeech 2019, 2170-2174, DOI: 10.21437/Interspeech.2019-1365.

  author={Shiliang Zhang and Yuan Liu and Ming Lei and Bin Ma and Lei Xie},
  title={{Towards Language-Universal Mandarin-English Speech Recognition}},
  booktitle={Proc. Interspeech 2019},