EUROSPEECH 2001 Scandinavia
In this work, we make a study on the effect of training set on statistical language modeling (SLM). A corpus selection system based on perplexity is presented. It is tested in two experiments: one is to select optimal training corpus for generating a domain-specific SLM; the other one is for generating an optimal SLM for a LVCSR system. The results show that the training corpus is important for the capability of SLM and our corpus selection system is powerful for optimal corpus selection. With the help of this system, we generated a SLM for a LVCSR system, which contributed 14.5%--17.7% relative character error reduction.
Bibliographic reference. Shen, Xipeng / Xu, Bo (2001): "The study of the effect of training set on statistical language modeling", In EUROSPEECH-2001, 721-724.