International Workshop on Spoken Language Translation (IWSLT) 2010

Paris, France
December 2-3, 2010

CCG Augmented Hierarchical Phrase-Based Machine Translation

Hala Almaghout, Jie Jiang, Andy Way

CNGL, School of Computing, Dublin City University, Dublin, Ireland

We present a method to incorporate target-language syntax in the form of Combinatory Categorial Grammar in the Hierarchical Phrase-Based MT system. We adopt the approach followed by Syntax Augmented Machine Translation (SAMT) to attach syntactic categories to nonterminals in hierarchical rules, but instead of using constituent grammar, we take advantage of the rich syntactic information and flexible structures of Combinatory Categorial Grammar. We present results on Chinese-English DIALOG IWSLT data and compare them with Moses SAMT4 and Moses Phrase-Based systems. Our results show 5.47% and 1.18% BLEU score relative increase over Moses SAMT4 and Phrase-Based systems, respectively. We conduct analysis on the reasons behind this improvement and we find out that our approach has better coverage than SAMT approach. Furthermore, Combinatory Categorial Grammar-based syntactic categories attached to nonterminals in hierarchical rules prove to be less sparse and can generalize better than syntactic categories extracted according to SAMT method.

Full Paper

Bibliographic reference.  Almaghout, Hala / Jiang, Jie / Way, Andy (2010): "CCG augmented hierarchical phrase-based machine translation", In IWSLT-2010, 211-218.