Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages

Solomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz

Development of Multilingual Automatic Speech Recognition (ASR) systems enables to share existing speech and text corpora among languages. We have conducted experiments on the development of multilingual Acoustic Models (AM) and Language Models (LM) for Tigrigna. Using Amharic Deep Neural Network (DNN) AM, Tigrigna pronunciation dictionary and trigram LM, we achieved a Word Error Rate (WER) of 30.9% for Tigrigna. Adding training speech from the target language (Tigrigna) to the whole training speech of the donor language (Amharic) continuously reduces WER with the amount of added data. We have also developed different (including recurrent neural networks based) multilingual LMs and achieved a relative WER reduction of 3.56% compared to the use of monolingual trigram LMs. Considering scarcity of computational resources to decode with very large vocabularies, we have also experimented on the use of morphemes as pronunciation and language modeling units. We have achieved character error rate (CER) of 7.9% which is relatively lower by 38.3% to 1.3% than the CER of the word-based models of smaller vocabularies than 162k. Our results show the possibility of developing ASR system for an Ethio-Semitic language using an existing speech and text corpora of another language in the family.

 DOI: 10.21437/Interspeech.2020-2856

Cite as: Abate, S.T., Tachbelie, M.Y., Schultz, T. (2020) Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages. Proc. Interspeech 2020, 1047-1051, DOI: 10.21437/Interspeech.2020-2856.

  author={Solomon Teferra Abate and Martha Yifiru Tachbelie and Tanja Schultz},
  title={{Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages}},
  booktitle={Proc. Interspeech 2020},