Effects of Dialectal Code-Switching on Speech Modules: A Study Using Egyptian Arabic Broadcast Speech

Shammur A. Chowdhury, Younes Samih, Mohamed Eldesouki, Ahmed Ali


The intra-utterance code-switching (CS) is defined as the alternation between two or more languages within the same utterance. Despite the fact that spoken dialectal code-switching (DCS) is more challenging than CS, it remains largely unexplored. In this study, we describe a method to build the first spoken DCS corpus. The corpus is annotated at the token-level minding both linguistic and acoustic cues for dialectal Arabic. For detailed analysis, we study Arabic automatic speech recognition (ASR), Arabic dialect identification (ADI), and natural language processing (NLP) modules for the DCS corpus. Our results highlight the importance of lexical information for discriminating the DCS labels. We observe that the performance of different models is highly dependent on the degree of code-mixing at the token-level as well as its complexity at the utterance-level.


 DOI: 10.21437/Interspeech.2020-2271

Cite as: Chowdhury, S.A., Samih, Y., Eldesouki, M., Ali, A. (2020) Effects of Dialectal Code-Switching on Speech Modules: A Study Using Egyptian Arabic Broadcast Speech. Proc. Interspeech 2020, 2382-2386, DOI: 10.21437/Interspeech.2020-2271.


@inproceedings{Chowdhury2020,
  author={Shammur A. Chowdhury and Younes Samih and Mohamed Eldesouki and Ahmed Ali},
  title={{Effects of Dialectal Code-Switching on Speech Modules: A Study Using Egyptian Arabic Broadcast Speech}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2382--2386},
  doi={10.21437/Interspeech.2020-2271},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2271}
}