Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Language Identification with Embedded Word Models

Padma Ramesh, David B. Roe

Speech Research Department, AT&T Bell Laboratories, Murray Hill, NJ, USA

This paper presents results on acoustic identification of which language is being spoken. Conventional approaches to spoken language identification are aimed at determining the language spoken by any speaker, on any subject, over any transmission channel. Typically, such systems have achieved accuracies around 80% on identification of 10 languages. The new feature of this work is the use of embedded models of frequently occurring words and phrases, in addition to a conventional Hidden Markov Model of the language to be recognized. The experimental results on four languages indicate a substantial improvement in accuracy, especially when one or more of the key words is present in the speech to be identified. For phrases that contain one of the key words that are explicitly modeled, the correct language is identified 93% of the time, with utterances that have an average length of 2 seconds. We also report results on language identification in cross-channel conditions - models of language created over one channel, and test data collected over a different channel. Adaptive background modeling and spectral correction can raise the accuracy in these challenging conditions.

Full Paper

Bibliographic reference.  Ramesh, Padma / Roe, David B. (1994): "Language identification with embedded word models", In ICSLP-1994, 1887-1890.