Dynamic Transcription for Low-Latency Speech Translation

Jan Niehues, Thai Son Nguyen, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Müller, Matthias Sperber, Sebastian Stüker, Alex Waibel

Latency is one of the main challenges in the task of simultaneous spoken language translation. While significant improvements in recent years have led to high quality automatic translations, their usefulness in real-time settings is still severely limited due to the large delay between the input speech and the delivered translation.

In this paper, we present a novel scheme which reduces the latency of a large scale speech translation system drastically. Within this scheme, the transcribed text and its translation can be updated when more context is available, even after they are presented to the user. Thereby, this scheme allows us to display an initial transcript and its translation to the user with a very low latency. If necessary, both transcript and translation can later be updated to better, more accurate versions until eventually the final versions are displayed. Using this framework, we are able to reduce the latency of the source language transcript into half. For the translation, an average delay of 3.3s was achieved, which is more than twice as fast as our initial system.

DOI: 10.21437/Interspeech.2016-154

Cite as

Niehues, J., Nguyen, T.S., Cho, E., Ha, T., Kilgour, K., Müller, M., Sperber, M., Stüker, S., Waibel, A. (2016) Dynamic Transcription for Low-Latency Speech Translation. Proc. Interspeech 2016, 2513-2517.

author={Jan Niehues and Thai Son Nguyen and Eunah Cho and Thanh-Le Ha and Kevin Kilgour and Markus Müller and Matthias Sperber and Sebastian Stüker and Alex Waibel},
title={Dynamic Transcription for Low-Latency Speech Translation},
booktitle={Interspeech 2016},