International Workshop on Spoken Language Translation (IWSLT) 2010

Paris, France
December 2-3, 2010

Towards a General and Extensible Phrase-Extraction Algorithm

Wang Ling, Tiago Luís, João Graça, Luísa Coheur, Isabel Trancoso

L2F Spoken Systems Lab, INESC-ID Lisboa, Portugal

Phrase-based systems deeply depend on the quality of their phrase tables and therefore, the process of phrase extraction is always a fundamental step. In this paper we present a general and extensible phrase extraction algorithm, where we have highlighted several control points. The instantiation of these control points allows the simulation of previous approaches, as in each one of these points different strategies/heuristics can be tested. We show how previous approaches fit in this algorithm, compare several of them and, in addition, we propose alternative heuristics, showing their impact on the final translation results. Considering two different test scenarios from the IWSLT 2010 competition (BTEC, Fr-En and DIALOG, Cn-En), we have obtained an improvement in the results of 2.4 and 2.8 BLEU points, respectively.

Full Paper

Bibliographic reference.  Ling, Wang / Luís, Tiago / Graça, João / Coheur, Luísa / Trancoso, Isabel (2010): "Towards a general and extensible phrase-extraction algorithm", In IWSLT-2010, 313-320.