Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Language Identification of Six Languages Based on a Common Set of Broad Phonemes

Kay M. Berkling, Etienne Barnard

Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology, Portland, OR, USA

We describe a system designed to recognize the language of an utterance spoken by any native speaker over the telephone. Our previous work based on language-specific phonemes [5] is extended to include sequences of all lengths of language-independent speech units. These units are derived by clustering phonemes across all languages in the system (Hindi, Spanish, English, German, Japanese, and Mandarin). Our language-identification results based on broad-phoneme occurrence statistics indicate 90% accurate distinction between English and Japanese, which is comparable to results obtained when using language-specific phonemes. By relaxing the precision of language-dependent phonemes into language-independent broad phonemes we thus retain language discriminative power. The degree to which the precision can be relaxed while retaining sequences of broad phonemes that can discriminate between languages is an indication of the accuracy with which the phoneme segmenter and recognizer have to recognize the incoming speech.

Full Paper

Bibliographic reference.  Berkling, Kay M. / Barnard, Etienne (1994): "Language identification of six languages based on a common set of broad phonemes", In ICSLP-1994, 1891-1894.