International Workshop on Spoken Language Translation (IWSLT) 2010
Phrase-based systems deeply depend on the quality of their phrase tables and therefore, the process of phrase extraction is always a fundamental step. In this paper we present a general and extensible phrase extraction algorithm, where we have highlighted several control points. The instantiation of these control points allows the simulation of previous approaches, as in each one of these points different strategies/heuristics can be tested. We show how previous approaches fit in this algorithm, compare several of them and, in addition, we propose alternative heuristics, showing their impact on the final translation results. Considering two different test scenarios from the IWSLT 2010 competition (BTEC, Fr-En and DIALOG, Cn-En), we have obtained an improvement in the results of 2.4 and 2.8 BLEU points, respectively.
Bibliographic reference. Ling, Wang / Luís, Tiago / Graça, João / Coheur, Luísa / Trancoso, Isabel (2010): "Towards a general and extensible phrase-extraction algorithm", In IWSLT-2010, 313-320.