Symposium on Machine Learning in Speech and Language Processing (MLSLP)
Portland, Oregon, USA
Discriminative language modeling methods learn language model parameterizations from system outputs when applied to training data, by optimizing objective functions close to actual system objectives via algorithms such as the perceptron. Thus, for speech recognition, training data utterances are recognized with a baseline recognizer, and the output lattices or n-best lists are evaluated with respect to the reference transcript to determine parameter updates. Relying on supervised training data limits the amount of available data for such methods, even in the most data-rich scenarios. In this talk, I will review some recent work on discriminative language modeling for speech recognition that aims to augment existing supervised training data by "hallucinating" system outputs in cases where no system inputs exist, e.g., from text corpora. We present several approaches for hallucinating confusion sets for a given reference string, and compare results achieved with these methods on large vocabulary continuous speech recognition tasks. We demonstrate real system improvements using such methods, and discuss potential future directions for the work.
Bibliographic reference. Roark, Brian / Celebi, Arda / Dikici, Erinc / Khudanpur, Sanjeev / Lehr, Maider / Prud'hommeaux, Emily / Sagae, Kenji / Saraclar, Murat / Shafran, Izhak / Xu, Puyang (2012): "Hallucinating system outputs for discriminative language modeling", In MLSLP-2012 (abstract).