International Workshop on Spoken Language Translation (IWSLT) 2006

Keihanna Science City, Kyoto, Japan
November 27-28, 2006

IBM Arabic-to-English Translation for IWSLT 2006

Young-Suk Lee

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

We present techniques for improving domainspecific translation quality with a relatively high OOV ratio on test data sets. The key idea is to maximize the vocabulary coverage without degrading the translation quality. We maximize vocabulary coverage by segmenting a word into a sequence of morphemes, prefix*-stem-suffix* and by adding a large amount of out-of-domain training corpora. To preserve the domainspecific meaning of vocabularies occurring in both domain-specific and out-of-domain training corpora, we assign a higher weight to the domain-specific corpus than to the out-ofdomain corpora. IBM Arabic-to-English spoken language translation systems using these techniques have demonstrated the best performances in the Open Data Track of the IWSLT2006 Evaluation Campaign.

Full Paper     Presentation

Bibliographic reference.  Lee, Young-Suk (2006): "IBM Arabic-to-English translation for IWSLT 2006", In IWSLT-2006, 45-52.