13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Morpheme Level Feature-based Language Models for German LVCSR

Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney

Human Language Technology and Pattern Recognition – Computer Science Department, RWTH Aachen University, Germany

One of the challenges for Large Vocabulary Continuous Speech Recognition (LVCSR) of German is its complex morphology and high level of compounding. It leads to high Out-of-vocabulary (OOV) rates, and poor Language Model (LM) probabilities. In such cases, building LMs on morpheme level can be considered a better choice. Thereby, higher lexical coverage and lower LM perplexities are achieved. On the other side, a successful approach to improve the LM probability estimation is to incorporate features of words using feature-based LMs. In this paper, we use features derived for morphemes as well as words. Thus, we combine the benefits of both morpheme level and feature rich modeling. We compare the performance of stream-based, class-based and factored LMs (FLMs). Relative reductions of around 1.5% in Word Error Rate (WER) are achieved compared to the best previous results obtained using FLMs.

Index Terms: language model, morpheme, streambased, class-based, factored

Full Paper

Bibliographic reference.  Mousa, Amr El-Desoky / Basha Shaik, M. Ali / Schlüter, Ralf / Ney, Hermann (2012): "Morpheme level feature-based language models for German LVCSR", In INTERSPEECH-2012, 170-173.