International Workshop on Spoken Language Translation (IWSLT) 2011

San Francisco, CA, USA
December 8-9, 2011

Extending a Probabilistic Phrase Alignment Approach for SMT

Mridul Gupta, Sanjika Hewavitharana, Stephan Vogel

Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA

Phrase alignment is a crucial step in phrase-based statistical machine translation. We explore a way of improving phrase alignment by adding syntactic information in the form of chunks as soft constraints guided by an in-depth and detailed analysis on a hand-aligned data set. We extend a probabilistic phrase alignment model that extracts phrase pairs by optimizing phrase pair boundaries over the sentence pair [1]. The boundaries of the target phrase are chosen such that the overall sentence alignment probability is optimal. Viterbi alignment information is also added in the extended model with a view of improving phrase alignment. We extract phrase pairs using a relatively larger number of features which are discriminatively trained using a large-margin online learning algorithm, i.e., Margin Infused Relaxed Algorithm (MIRA) and integrate it in our approach. Initial experiments show improvements in both phrase alignment and translation quality for Arabic-English on a moderate-size translation task.

Full Paper

Bibliographic reference.  Gupta, Mridul / Hewavitharana, Sanjika / Vogel, Stephan (2011): "Extending a probabilistic phrase alignment approach for SMT", In IWSLT-2011, 175-182.