Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems

Srikanth Madikeri, Banriskhem K. Khonglah, Sibo Tong, Petr Motlicek, Hervé Bourlard, Daniel Povey


Multilingual acoustic model training combines data from multiple languages to train an automatic speech recognition system. Such a system is beneficial when training data for a target language is limited. Lattice-Free Maximum Mutual Information (LF-MMI) training performs sequence discrimination by introducing competing hypotheses through a denominator graph in the cost function. The standard approach to train a multilingual model with LF-MMI is to combine the acoustic units from all languages and use a common denominator graph. The resulting model is either used as a feature extractor to train an acoustic model for the target language or directly fine-tuned. In this work, we propose a scalable approach to train the multilingual acoustic model using a typical multitask network for the LF-MMI framework. A set of language-dependent denominator graphs is used to compute the cost function. The proposed approach is evaluated under typical multilingual ASR tasks using GlobalPhone and BABEL datasets. Relative improvements up to 13.2% in WER are obtained when compared to the corresponding monolingual LF-MMI baselines. The implementation is made available as a part of the Kaldi speech recognition toolkit.


 DOI: 10.21437/Interspeech.2020-2919

Cite as: Madikeri, S., Khonglah, B.K., Tong, S., Motlicek, P., Bourlard, H., Povey, D. (2020) Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems. Proc. Interspeech 2020, 4746-4750, DOI: 10.21437/Interspeech.2020-2919.


@inproceedings{Madikeri2020,
  author={Srikanth Madikeri and Banriskhem K. Khonglah and Sibo Tong and Petr Motlicek and Hervé Bourlard and Daniel Povey},
  title={{Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4746--4750},
  doi={10.21437/Interspeech.2020-2919},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2919}
}