International Workshop on Spoken Language Translation (IWSLT) 2008

Honolulu, Hawaii, USA
October 20-21, 2008

The CMU Syntax-Augmented Machine Translation System: SAMT on Hadoop with N-best alignments

Andreas Zollmann, Ashish Venugopal, Stephan Vogel

interACT, Language Technology Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

We present the CMU Syntax Augmented Machine Translation System that was used in the IWSLT-08 evaluation campaign. We participated in the Full-BTEC data track for Chinese-English translation, focusing on transcript translation. For this year's evaluation, we ported the Syntax Augmented MT toolkit [1] to the Hadoop MapReduce [2] parallel processing architecture, allowing us to efficiently run experiments evaluating a novel “wider pipelines” approach to integrate evidence from N-best alignments into our translation models. We describe each step of the MapReduce pipeline as it is implemented in the open-source SAMT toolkit, and show improvements in translation quality by using N-best alignments in both hierarchical and syntax augmented translation systems.

Full Paper     Presentation (pdf)

Bibliographic reference.  Zollmann, Andreas / Venugopal, Ashish / Vogel, Stephan (2008): "The CMU syntax-augmented machine translation system: SAMT on Hadoop with n-best alignments", In IWSLT-2008, 18-25.