Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages

Hardik B. Sailor, Thomas Hain


This paper proposes a multilingual acoustic modeling approach for Indian languages using a Multitask Learning (MTL) framework. Language-specific phoneme recognition is explored as an auxiliary task in MTL framework along with the primary task of multilingual senone classification. This auxiliary task regularizes the primary task with both the context-independent phonemes and language identities induced by language-specific phoneme. The MTL network is also extended by structuring the primary and auxiliary task outputs in the form of a Structured Output Layer (SOL) such that both depend on each other. The experiments are performed using a database of the three Indian languages Gujarati, Tamil, and Telugu. The experimental results show that the proposed MTL-SOL framework performed well compared to baseline monolingual systems with a relative reduction of 3.1–4.4 and 2.9–4.1% in word error rate for the development and evaluation sets, respectively.


 DOI: 10.21437/Interspeech.2020-2739

Cite as: Sailor, H.B., Hain, T. (2020) Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages. Proc. Interspeech 2020, 4756-4760, DOI: 10.21437/Interspeech.2020-2739.


@inproceedings{Sailor2020,
  author={Hardik B. Sailor and Thomas Hain},
  title={{Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4756--4760},
  doi={10.21437/Interspeech.2020-2739},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2739}
}