This paper investigates various approaches to data sampling and dimensionality reduction for discriminative language models (DLM). Being a feature based language modeling approach, the aim of DLM is to rerank the ASR output with discriminatively trained feature parameters. Using a Turkish morphology based feature set, we examine the use of online Principal Component Analysis (PCA) as a dimensionality reduction method. We exploit ranking perceptron and ranking SVM as two alternative discriminative modeling techniques, and apply data sampling to improve their efficiency. We obtain a reduction in word error rate (WER) of 0.4%, significant at p < 0.001 over the baseline perceptron result.
Bibliographic reference. Dikici, Erinç / Semerci, Murat / Saraçlar, Murat / Alpaydın, Ethem (2011): "Data sampling and dimensionality reduction approaches for reranking ASR outputs using discriminative language models", In INTERSPEECH-2011, 1461-1464.