International Workshop on Spoken Language Translation (IWSLT) 2011

San Francisco, CA, USA
December 8-9, 2011

Named Entity Translation using Anchor Texts

Wang Ling (1,2), Pável Calado (1), Bruno Martins (1), Isabel Trancoso (1), Alan Black (2), Luísa Coheur (1)

(1) L2F Spoken Systems Lab, INESC-ID, Lisboa, Portugal
(2) Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA

This work describes a process to extract Named Entity (NE) translations from the text available in web links (anchor texts). It translates a NE by retrieving a list of web documents in the target language, extracting the anchor texts from the links to those documents and finding the best translation from the anchor texts, using a combination of features, some of which, are specific to anchor texts. Experiments performed on a manually built corpora, suggest that over 70% of the NEs, ranging from unpopular to popular entities, can be translated correctly using sorely anchor texts. Tests on a Machine Translation task indicate that the system can be used to improve the quality of the translations of state-of-the-art statistical machine translation systems.

Full Paper

Bibliographic reference.  Ling, Wang / Calado, Pável / Martins, Bruno / Trancoso, Isabel / Black, Alan / Coheur, Luísa (2011): "Named entity translation using anchor texts", In IWSLT-2011, 206-213.