14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Morpheme Level Hierarchical Pitman-Yor Class-Based Language Models for LVCSR of Morphologically Rich Languages

Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney

RWTH Aachen University, Germany

Performing large vocabulary continuous speech recognition (LVCSR) for morphologically rich languages is considered a challenging task. The morphological richness of such languages leads to high out-of-vocabulary (OOV) rates and poor language model (LM) probabilities. In this case, the use of morphemes has been shown to increase the lexical coverage and lower the LM perplexity. Another approach used to improve the LM probability estimates is to incorporate additional knowledge sources in the LM estimation process using class-based LMs (CLMs). Recently, the hierarchical Pitman-Yor LMs (HPYLMs) have shown superiority over the modified Kneser-Ney (MKN) smoothed N-gram LMs in terms of both perplexity (PPL) and word error rate (WER) on word-based LVCSR tasks. In this paper, hierarchical Pitman-Yor class-based LMs (HPYCLMs) are combined with morpheme level language modeling. This enables the application of the proposed models on top of morpheme-based systems. Experiments are conducted on Arabic and German LVCSR tasks. Consistent performance improvements are obtained for all the available corpora compared to the conventional morphemebased and class-based LMs.

Full Paper

Bibliographic reference.  Mousa, Amr El-Desoky / Shaik, M. Ali Basha / Schlüter, Ralf / Ney, Hermann (2013): "Morpheme level hierarchical pitman-yor class-based language models for LVCSR of morphologically rich languages", In INTERSPEECH-2013, 3409-3413.