Second International Conference on Spoken Language Processing (ICSLP'92)

Banff, Alberta, Canada
October 13-16, 1992

Automatic Segmentation and Identification of Ten Languages Using Telephone Speech

Yeshwant K. Muthusamy, Ronald A. Cole

Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology, Beaverton, OR, USA

This paper extends our previous work on automatic language identification using 4 languages and high-quality speech, to automatic identification of 10 languages using telephone speech. The systems described here consist of two parts: (a) segmentation of telephone speech into seven broad phonetic categories and (b) classification of languages using feature measurements derived from the broad phonetic categories. Both the segmentation and classification stages use fully connected, feed-forward neural networks. When tested on new speakers from the 10 languages, the multi-language segmentation algorithm agrees with the handlabels 79.8% of the time. Classifiers were trained to identify (i) all 10 languages, (ii) each language vs. all others, (iii) the pairs English-Z, where L is one of the remaining 9 languages, and (iv) the triples English-L-OŁ/ier, where Other consists of the remaining 8 languages. Performance varied from 47.7% for the single 10-language network to 88.6% for the English-Tamil network. Classification performance of human listeners on short excerpts of speech is also reported.

Full Paper

Bibliographic reference.  Muthusamy, Yeshwant K. / Cole, Ronald A. (1992): "Automatic segmentation and identification of ten languages using telephone speech", In ICSLP-1992, 1007-1010.