12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Data Sampling and Dimensionality Reduction Approaches for Reranking ASR Outputs Using Discriminative Language Models

Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın

Boğaziçi Üniversitesi, Turkey

This paper investigates various approaches to data sampling and dimensionality reduction for discriminative language models (DLM). Being a feature based language modeling approach, the aim of DLM is to rerank the ASR output with discriminatively trained feature parameters. Using a Turkish morphology based feature set, we examine the use of online Principal Component Analysis (PCA) as a dimensionality reduction method. We exploit ranking perceptron and ranking SVM as two alternative discriminative modeling techniques, and apply data sampling to improve their efficiency. We obtain a reduction in word error rate (WER) of 0.4%, significant at p < 0.001 over the baseline perceptron result.

Full Paper

Bibliographic reference.  Dikici, Erinç / Semerci, Murat / Saraçlar, Murat / Alpaydın, Ethem (2011): "Data sampling and dimensionality reduction approaches for reranking ASR outputs using discriminative language models", In INTERSPEECH-2011, 1461-1464.