International Workshop on Spoken Language Translation (IWSLT) 2010

Paris, France
December 2-3, 2010

Real-Time Spoken Language Identification and Recognition for Speech-to-Speech Translation

Daniel Chung Yong Lim (1,2), Ian Lane (1), Alex Waibel (1)

(1) Language Technologies Institute, Carnegie Mellon University, USA
(2) DSO National Laboratories, Singapore

For spoken language systems to effectively operate across multiple languages it is critical to rapidly apply the correct language-specific speech recognition models. Prior approaches consist of either, first identifying the language being spoken and selecting the appropriate languagespecific speech recognition engine; or alternatively, performing speech recognition in parallel and selecting the language and recognition hypothesis with maximum likelihood. Both these approaches, however, introduce a significant delay before back-end natural language processing can proceed. In this work, we propose a novel method for joint language identification and speech recognition that can operate in near real-time. The proposed approach compares partial hypotheses generated on-the-fly during decoding and generates a classification decision soon after the first full hypothesis has been generated. When applied within our English-Iraqi speech-to-speech translation system the proposed approach correctly identified the input language with 99.6% accuracy while introducing minimal delay to the end-to-end system.

Index Terms. Language Identification, Speech Recognition, Multilingual Spoken Language Understanding

Full Paper

Bibliographic reference.  Lim, Daniel Chung Yong / Lane, Ian / Waibel, Alex (2010): "Real-time spoken language identification and recognition for speech-to-speech translation", In IWSLT-2010, 307-312.