International Workshop on Spoken Language Translation (IWSLT) 2011

San Francisco, CA, USA
December 8-9, 2011

Annotating Data Selection for Improving Machine Translation

Keiji Yasuda, Hideo Okuma, Masao Utiyama, Eiichiro Sumita

National Institute of Information and Communications Technology, Japan

In order to efficiently improve machine translation systems, we propose a method which selects data to be annotated (manually translated) from speech-to-speech translation field data. For the selection experiments, we used data from field experiments conducted during the 2009 fiscal year in five areas of Japan. For the selection experiments, we used data sets from two areas: one data set giving the lowest baseline speech translation performance for its test set, and another data set giving the highest. In the experiments, we compare two methods for selecting data to be manually translated from the field data. Both of them use source side language models for data selection, but in different manners. According to the experimental results, either or both of the methods show larger improvements compared to a random data selection.

Full Paper

Bibliographic reference.  Yasuda, Keiji / Okuma, Hideo / Utiyama, Masao / Sumita, Eiichiro (2011): "Annotating data selection for improving machine translation", In IWSLT-2011, 269-274.