Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition

Jisung Wang, Jihwan Kim, Sangki Kim, Yeha Lee


Automatic speech recognition (ASR) tasks are usually solved using lexicon-based hybrid systems or character-based acoustic models to automatically translate speech data into written text. While hybrid systems require a manually designed lexicon, end-to-end models can process character-based speech data. This resolves the need to define a lexicon for non-English languages for which a standard lexicon may be absent. Korean is relatively phonetic and has a unique writing system, and it is thus worth investigating useful modeling units for end-to-end Korean ASR. Our work is the first to compare the performance of deep neural networks (DNNs), designed as a combination of connectionist temporal classification and attention-based encoder-decoder, on various lexicon-free Korean models. Experiments on the Zeroth-Korean dataset and medical records, which consist of Korean-only and Korean-English code-switching corpora respectively, show how DNNs based on syllables and sub-words significantly outperform Jamo-based models on Korean ASR tasks. Our successful application of using lexicon-free modeling units on non-English ASR tasks provides compelling evidence that lexicon-free approaches can alleviate the heavy code-switching involved in non-English medical transcriptions.


 DOI: 10.21437/Interspeech.2020-2440

Cite as: Wang, J., Kim, J., Kim, S., Lee, Y. (2020) Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition. Proc. Interspeech 2020, 1072-1075, DOI: 10.21437/Interspeech.2020-2440.


@inproceedings{Wang2020,
  author={Jisung Wang and Jihwan Kim and Sangki Kim and Yeha Lee},
  title={{Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1072--1075},
  doi={10.21437/Interspeech.2020-2440},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2440}
}