ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

An empirical comparison of joint optimization techniques for speech translation

Masaya Ohgushi, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura

Speech translation (ST) systems consist of three major components: automatic speech recognition (ASR), machine translation (MT), and speech synthesis (SS). In general the ASR system is tuned independently to minimize word error rate (WER), but previous research has shown that ASR and MT can be jointly optimized to improve translation quality. Independently, many techniques have recently been proposed for the optimization of MT, such as empirical comparison of joint optimization using minimum error rate training (MERT), pairwise ranking optimization (PRO) and the batch margin infused relaxed algorithm (MIRA). The first contribution of this paper is an empirical comparison of these techniques in the context of joint optimization. As the last two methods are able to use sparse features, we also introduce lexicalized features using the frequencies of recognized words. In addition, motivated by initial results, we propose a hybrid optimization method that changes the translation evaluation measure depending on the features to be optimized. Experimental results for the best combination of algorithm and features show a gain of 1.3 BLEU points at 27% of the computational cost of previous joint optimization methods.

doi: 10.21437/Interspeech.2013-603

Cite as: Ohgushi, M., Neubig, G., Sakti, S., Toda, T., Nakamura, S. (2013) An empirical comparison of joint optimization techniques for speech translation. Proc. Interspeech 2013, 2619-2623, doi: 10.21437/Interspeech.2013-603

  author={Masaya Ohgushi and Graham Neubig and Sakriani Sakti and Tomoki Toda and Satoshi Nakamura},
  title={{An empirical comparison of joint optimization techniques for speech translation}},
  booktitle={Proc. Interspeech 2013},