INTERSPEECH 2013
14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Heuristic Selection of Training Sentences from Historical TV Guide for Semi-Supervised LM Adaptation

Harry M. Chang

AT&T Labs Research, USA

This paper describes a novel approach to the automatic selection of training sentences from a system-generated data feed for the development of high-precision language models (LMs) required for speech-enabled voice interface applications in the TV search domain. We develop a set of heuristic rules to select training sentences directly from the TV electronic programming guide (EPG) in their metadata form. The training corpus constructed using the selection algorithms encoded with the historical EPG data enables the adapted LMs to have a considerably lower perplexity while achieving a significant reduction in word error rate (WER). When evaluated using the user-generated spoken queries to an experimental TV search application, a 10% absolute reduction of WER is reported over the baseline LMs created without using the training sentences generated from the historical EPG data.

Full Paper

Bibliographic reference.  Chang, Harry M. (2013): "Heuristic selection of training sentences from historical TV guide for semi-supervised LM adaptation", In INTERSPEECH-2013, 2227-2231.