Improving Speech Recognition of Compound-Rich Languages

Prabhat Pandey, Volker Leutnant, Simon Wiesler, Jahn Heymann, Daniel Willett

Traditional hybrid speech recognition systems use a fixed vocabulary for recognition, which is a challenge for agglutinative and compounding languages due to the presence of large number of rare words. This causes high out-of-vocabulary rate and leads to poor probability estimates for rare words. It is also important to keep the vocabulary size in check for a low-latency WFST-based speech recognition system. Previous works have addressed this problem by utilizing subword units in the language model training and merging them back to reconstruct words in the post-processing step. In this paper, we extend such open vocabulary approaches by focusing on compounding aspect. We present a data-driven unsupervised method to identify compound words in the vocabulary and learn rules to segment them. We show that compound modeling can achieve 3% to 8% relative reduction in word error rate and up to 9% reduction in the vocabulary size compared to word-based models. We also show the importance of consistency between the lexicon employed during decoding and acoustic model training for subword-based systems.

 DOI: 10.21437/Interspeech.2020-2514

Cite as: Pandey, P., Leutnant, V., Wiesler, S., Heymann, J., Willett, D. (2020) Improving Speech Recognition of Compound-Rich Languages. Proc. Interspeech 2020, 4936-4940, DOI: 10.21437/Interspeech.2020-2514.

  author={Prabhat Pandey and Volker Leutnant and Simon Wiesler and Jahn Heymann and Daniel Willett},
  title={{Improving Speech Recognition of Compound-Rich Languages}},
  booktitle={Proc. Interspeech 2020},