Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995


Lori Lamel, M. Adda-Decker, Jean-Luc Gauvain

LIMSI - CNRS, Orsay, France

In this paper we report on our activities in multilingual, speaker-independent, large vocabulary continuous speech recognition. The multilingual aspect of this work is of particular importance in Europe, where each country has its own national language. Our existing recognizer for American English and French, has been ported to British English and German. It has been assessed in the context of the LRE Sqale project whose objective was to experiment with installing in Europe a multilingual evaluation paradigm for the assessment of large vocabulary, continuous speech recognition systems. The recognizer makes use of phone-based continuous density HMM for acoustic modeling and n-gram statistics estimated on newspaper texts for language modeling. The system has been evaluated on a dictation task with read, newspaper-based corpora, the ARPA Wall Street Journal corpus of American English, the WSJCAMO corpus of British English, the BREF-Le Monde corpus of French and the PHONDXT-Frankfurter Rundschau corpus of German. Under closely matched conditions, the average word accuracy across all 4 languages is 85%, obtained with an open-vocabulary test and 20k trigram systems (64k system German).

Full Paper

Bibliographic reference.  Lamel, Lori / Adda-Decker, M. / Gauvain, Jean-Luc (1995): "ISSUES IN LARGE VOCABULARY, MULTILINGUAL SPEECH RECOGNITION", In EUROSPEECH-1995, 185-188.