International Workshop on Spoken Language Translation (IWSLT) 2011

San Francisco, CA, USA
December 8-9, 2011

Protocol and Lessons Learnt from the Production of Parallel Corpora for the Evaluation of Speech Translation Systems

Victoria Arranz (1), Olivier Hamon (1), Karim Boudahmane (2), Martine Garnier-Rizet (3)

(1) ELDA, Paris, France; (2) DGA, Bagneux, France; (3) IMMI, Univ. Paris-Sud, Orsay, France

Machine translation evaluation campaigns require the production of reference corpora to automatically measure system output. This paper describes recent efforts to create such data with the objective of measuring the quality of the systems participating in the Quaero evaluations. In particular, we focus on the protocols behind such production as well as all the issues raised by the complexity of the transcription data handled.

Full Paper

Bibliographic reference.  Arranz, Victoria / Hamon, Olivier / Boudahmane, Karim / Garnier-Rizet, Martine (2011): "Protocol and lessons learnt from the production of parallel corpora for the evaluation of speech translation systems", In IWSLT-2011, 129-135.