Should Code-switching Models Be Asymmetric?

Barbara E. Bullock, Gualberto Guzmán, Jacqueline Serigos, Almeida Jacqueline Toribio

Since the work of Joshi [1], most models of code-switching (C-S) have assumed asymmetry of the participating languages. While there exist patterns of language mixing in which a dominant or matrix language (ML) may not be discernible, these more complex signatures are rarely modeled [2, 3]. We use a series of metrics to characterize the switching in corpora as asymmetrical (insertional C-S) or symmetrical (alternational C-S). We test the efficacy of a linguistic model that assumes no ML in predicting the syntax of C-S in three Spanish–English corpora that vary according to whether the ML is Spanish, English or indeterminate. Our results show that the same constraints on the grammatical junctures and on the directionality of switching hold irrespective of the symmetry of the data. The length of the alternating language spans varies according to POS with noun phrases comprising the shortest spans. This suggests that insertional C-S may be subsumed under alternational C-S, as spontaneous borrowing. These results invite researchers to reconsider the linguistic theories they adopt and to expand the typology of training data used in creating language models and processing tools for C-S.

 DOI: 10.21437/Interspeech.2018-1284

Cite as: Bullock, B.E., Guzmán, G., Serigos, J., Toribio, A.J. (2018) Should Code-switching Models Be Asymmetric?. Proc. Interspeech 2018, 2534-2538, DOI: 10.21437/Interspeech.2018-1284.

  author={Barbara E. Bullock and Gualberto Guzmán and Jacqueline Serigos and Almeida Jacqueline Toribio},
  title={Should Code-switching Models Be Asymmetric?},
  booktitle={Proc. Interspeech 2018},