Code-Switching Sentence Generation by Bert and Generative Adversarial Networks

Yingying Gao, Junlan Feng, Ying Liu, Leijing Hou, Xin Pan, Yong Ma

Code-switching has become a common linguistic phenomenon. Comparing to monolingual ASR tasks, insufficient data is a major challenge for code-switching speech recognition. In this paper, we propose an approach to compositionally employ the Bidirectional Encoder Representations from Transformers (Bert) model and Generative Adversarial Net (GAN) model for code-switching text data generation. It improves upon previous work by (1) applying Bert as a masked language model to predict the mixed-in foreign words and (2) basing on the GAN framework with Bert for both the generator and discriminator to further assure the generated sentences similar enough to the natural examples. We evaluate the effectiveness of the generated data by its contribution to ASR. Experiments show our approach can reduce the English word error rate by 1.5% with the Mandarin-English code-switching spontaneous speech corpus OC16-CE80.

 DOI: 10.21437/Interspeech.2019-2501

Cite as: Gao, Y., Feng, J., Liu, Y., Hou, L., Pan, X., Ma, Y. (2019) Code-Switching Sentence Generation by Bert and Generative Adversarial Networks. Proc. Interspeech 2019, 3525-3529, DOI: 10.21437/Interspeech.2019-2501.

  author={Yingying Gao and Junlan Feng and Ying Liu and Leijing Hou and Xin Pan and Yong Ma},
  title={{Code-Switching Sentence Generation by Bert and Generative Adversarial Networks}},
  booktitle={Proc. Interspeech 2019},