Speech translation (ST) systems consist of three major components: automatic speech recognition (ASR), machine translation (MT), and speech synthesis (SS). In general the ASR system is tuned independently to minimize word error rate (WER), but previous research has shown that ASR and MT can be jointly optimized to improve translation quality. Independently, many techniques have recently been proposed for the optimization of MT, such as empirical comparison of joint optimization using minimum error rate training (MERT), pairwise ranking optimization (PRO) and the batch margin infused relaxed algorithm (MIRA). The first contribution of this paper is an empirical comparison of these techniques in the context of joint optimization. As the last two methods are able to use sparse features, we also introduce lexicalized features using the frequencies of recognized words. In addition, motivated by initial results, we propose a hybrid optimization method that changes the translation evaluation measure depending on the features to be optimized. Experimental results for the best combination of algorithm and features show a gain of 1.3 BLEU points at 27% of the computational cost of previous joint optimization methods.
Bibliographic reference. Ohgushi, Masaya / Neubig, Graham / Sakti, Sakriani / Toda, Tomoki / Nakamura, Satoshi (2013): "An empirical comparison of joint optimization techniques for speech translation", In INTERSPEECH-2013, 2619-2623.