International Workshop on Spoken Language Translation (IWSLT) 2011
San Francisco, CA, USA
In this paper, we provide a description of the Dublin City University's (DCU) submissions in the IWSLT 2011 evaluation campaign. We participated in the Arabic-English and Chinese-English Machine Translation(MT) track translation tasks. We use phrase-based statistical machine translation (PBSMT) models to create the baseline system. Due to the open-domain nature of the data to be translated, we use domain adaptation techniques to improve the quality of translation. Furthermore, we explore target-side syntactic augmentation for an Hierarchical Phrase-Based (HPB) SMT model. Combinatory Categorial Grammar (CCG) is used to extract labels for target-side phrases and non-terminals in the HPB system. Combining the domain adapted language models with the CCG-augmented HPB system gave us the best translations for both language pairs providing statistically significant improvements of 6.09 absolute BLEU points (25.94% relative) and 1.69 absolute BLEU points (15.89% relative) over the unadapted PBSMT baselines for the Arabic-English and Chinese-English language pairs, respectively.
Bibliographic reference. Banerjee, Pratyush / Almaghout, Hala / Naskar, Sudip / Roturier, Johann / Jiang, Jie / Way, Andy / Genabith, Josef van (2011): "The DCU machine translation systems for IWSLT 2011", In IWSLT-2011, 41-48.