International Workshop on Spoken Language Translation (IWSLT) 2006

Keihanna Science City, Kyoto, Japan
November 27-28, 2006

Automatic Sentence Segmentation and Punctuation Prediction for Spoken Language Translation

Evgeny Matusov, Arne Mauser, Hermann Ney

Lehrstuhl für Informatik 6, RWTH Aachen University, Aachen, Germany

This paper studies the impact of automatic sentence segmentation and punctuation prediction on the quality of machine translation of automatically recognized speech. We present a novel sentence segmentation method which is specifically tailored to the requirements of machine translation algorithms and is competitive with state-of-the-art approaches for detecting sentence-like units. We also describe and compare three strategies for predicting punctuation in a machine translation framework, including the simple and effective implicit punctuation generation by a statistical phrase-based machine translation system. Our experiments show the robust performance of the proposed sentence segmentation and punctuation prediction approaches on the IWSLT Chinese-to-English and TC-STAR English-to-Spanish speech translation tasks in terms of translation quality.

Full Paper     Presentation

Bibliographic reference.  Matusov, Evgeny / Mauser, Arne / Ney, Hermann (2006): "Automatic sentence segmentation and punctuation prediction for spoken language translation", In IWSLT-2006, 158-165.