Improving Code-Switching Language Modeling with Artificially Generated Texts Using Cycle-Consistent Adversarial Networks

Chia-Yu Li, Ngoc Thang Vu


This paper presents our latest effort on improving Code-switching language models that suffer from data scarcity. We investigate methods to augment Code-switching training text data by artificially generating them. Concretely, we propose a cycle-consistent adversarial networks based framework to transfer monolingual text into Code-switching text, considering Code-switching as a speaking style. Our experimental results on the SEAME corpus show that utilizing artificially generated Code-switching text data improves consistently the language model as well as the automatic speech recognition performance.


 DOI: 10.21437/Interspeech.2020-2177

Cite as: Li, C., Vu, N.T. (2020) Improving Code-Switching Language Modeling with Artificially Generated Texts Using Cycle-Consistent Adversarial Networks. Proc. Interspeech 2020, 1057-1061, DOI: 10.21437/Interspeech.2020-2177.


@inproceedings{Li2020,
  author={Chia-Yu Li and Ngoc Thang Vu},
  title={{Improving Code-Switching Language Modeling with Artificially Generated Texts Using Cycle-Consistent Adversarial Networks}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1057--1061},
  doi={10.21437/Interspeech.2020-2177},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2177}
}