EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Back-Off Smoothing Evaluation over Syntactic Language Models

A. Varona, I. Torres

Facultad de Ciencias. UPV-EHU., Spain

Continuous Speech Recognition systems require a Language Model (LM) to represent the syntactic constraints of the language. In LMs development a smoothing technique needs to be applied to also consider events not represented in the training corpus. In this work, several back-off smoothing approaches have been compared: classical discounting-distribution schema including Witten-Bell, Absolute and Linear discounting and a new proposal, the Delimited discounting. Delimited discounting deals with the Turing discounting problems while keeping the Katz's smoothing scheme. The experimental evaluation was carried out over a Spanish speech application task, showing that an increase of the test set perplexity of a LM does not always mean a degradation in the model performance when integrated into a CSR system. Besides, there is a strong dependence between the amount of probability reserved by the smoothing technique to be assigned to unseen events and the value of the balance parameter applied to the LM probabilities in the Bayes's rule needed to get the best system performance.

Full Paper

Bibliographic reference.  Varona, A. / Torres, I. (2001): "Back-off smoothing evaluation over syntactic language models", In EUROSPEECH-2001, 2135-2138.