Classification of sentences based on their meaning (or concept) has been used as component in speech translation and spoken language understanding systems. Preparing training data for this type of classifiers is often a tedious task. In our previous work, we presented a method of clustering sentences as a step toward automated annotation of concepts. To measure the distance between two sentences, that method relied on the local lexical dependencies in their translations. In this work, we apply Topic Modeling to enhance the previously proposed distance metric so that it includes information from semantic associations among the words. Our experiments on the DARPA USC Transonics and BBN Transtac data sets show the advantage of incorporating this information as performance improvements in a set of clustering tasks.
Bibliographic reference. Ettelaie, Emil / Georgiou, Panayiotis G. / Narayanan, Shrikanth (2011): "Enhancements to the training process of classifier-based speech translator via topic modeling", In INTERSPEECH-2011, 2109-2112.