EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


The Study Of The Effect Of Training Set On Statistical Language Modeling

Xipeng Shen, Bo Xu

Institute of Automation, Chinese Academy of Sciences, P.R. China

In this work, we make a study on the effect of training set on statistical language modeling (SLM). A corpus selection system based on perplexity is presented. It is tested in two experiments: one is to select optimal training corpus for generating a domain-specific SLM; the other one is for generating an optimal SLM for a LVCSR system. The results show that the training corpus is important for the capability of SLM and our corpus selection system is powerful for optimal corpus selection. With the help of this system, we generated a SLM for a LVCSR system, which contributed 14.5%--17.7% relative character error reduction.

Full Paper

Bibliographic reference.  Shen, Xipeng / Xu, Bo (2001): "The study of the effect of training set on statistical language modeling", In EUROSPEECH-2001, 721-724.