International Workshop on Spoken Language Translation (IWSLT) 2006

Keihanna Science City, Kyoto, Japan
November 27-28, 2006

Continuous Space Language Models for the IWSLT 2006 Task

Holger Schwenk (1), Marta R. Costa-jussà (2), José A. R. Fonollosa (2)

(1) LIMSI-CNRS, Orsay, France / (2) Universitat Politècnica de Catalunya (UPC), Barcelona, Spain

The language model of the target language plays an important role in statistical machine translation systems. In this work, we propose to use a new statistical language model that is based on a continuous representation of the words in the vocabulary. A neural network is used to perform the projection and the probability estimation. This kind of approach is in particular promising for tasks where a very limited amount of resources are available, like the BTEC corpus of tourism related questions.
   This language model is used in two state-of-the-art statistical machine translation systems that were developed by UPC for the 2006 IWSLT evaluation campaign: a phrase- and an n-gram-based approach. An experimental evaluation for four different language pairs is provided (translation of Mandarin, Japanese, Arabic and Italian to English). The proposed method achieved improvements in the BLEU score of up to 3 points on the development data and of almost 2 points on the official test data.

Full Paper     Presentation

Bibliographic reference.  Schwenk, Holger / Costa-jussà, Marta R. / Fonollosa, José A. R. (2006): "Continuous space language models for the IWSLT 2006 task", In IWSLT-2006, 166-173.