International Workshop on Spoken Language Translation (IWSLT) 2011
San Francisco, CA, USA
In order to efficiently improve machine translation systems, we propose a method which selects data to be annotated (manually translated) from speech-to-speech translation field data. For the selection experiments, we used data from field experiments conducted during the 2009 fiscal year in five areas of Japan. For the selection experiments, we used data sets from two areas: one data set giving the lowest baseline speech translation performance for its test set, and another data set giving the highest. In the experiments, we compare two methods for selecting data to be manually translated from the field data. Both of them use source side language models for data selection, but in different manners. According to the experimental results, either or both of the methods show larger improvements compared to a random data selection.
Bibliographic reference. Yasuda, Keiji / Okuma, Hideo / Utiyama, Masao / Sumita, Eiichiro (2011): "Annotating data selection for improving machine translation", In IWSLT-2011, 269-274.