International Workshop on Spoken Language Translation (IWSLT) 2011

San Francisco, CA, USA
December 8-9, 2011

The DCU Machine Translation Systems for IWSLT 2011

Pratyush Banerjee (1), Hala Almaghout (1), Sudip Naskar (1), Johann Roturier (3), Jie Jiang (2), Andy Way (2), Josef van Genabith (1)

(1) CNGL, School of Computing, Dublin City University, Dublin, Ireland
(2) Applied Language Solutions, Delph, UK
(3) Symantec Limited, Dublin, Ireland

In this paper, we provide a description of the Dublin City University's (DCU) submissions in the IWSLT 2011 evaluation campaign. We participated in the Arabic-English and Chinese-English Machine Translation(MT) track translation tasks. We use phrase-based statistical machine translation (PBSMT) models to create the baseline system. Due to the open-domain nature of the data to be translated, we use domain adaptation techniques to improve the quality of translation. Furthermore, we explore target-side syntactic augmentation for an Hierarchical Phrase-Based (HPB) SMT model. Combinatory Categorial Grammar (CCG) is used to extract labels for target-side phrases and non-terminals in the HPB system. Combining the domain adapted language models with the CCG-augmented HPB system gave us the best translations for both language pairs providing statistically significant improvements of 6.09 absolute BLEU points (25.94% relative) and 1.69 absolute BLEU points (15.89% relative) over the unadapted PBSMT baselines for the Arabic-English and Chinese-English language pairs, respectively.

Full Paper

Bibliographic reference.  Banerjee, Pratyush / Almaghout, Hala / Naskar, Sudip / Roturier, Johann / Jiang, Jie / Way, Andy / Genabith, Josef van (2011): "The DCU machine translation systems for IWSLT 2011", In IWSLT-2011, 41-48.