Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
This paper describes an automatic language identification method based on HMMs (Hidden Markov Models) for acoustic features. The hidden Markov modeling is used to represent the dynamics of the states of the vocal tract. Each language has its proper phonotactics. For the experiment of the identification, utterances of 4 languages (English, Japanese, Mandarin Chinese and Indonesian) were modeled by several HMMs. They were uttered by 15 male speakers (10 for training the HMM and 5 for testing) for each language. These trained HMMs showed considerable inter-language variations. A HMM topology was a full structured (ergodic) model that any state could transit to every states. And we used 2 kinds of HMMs; the DHMM (discrete HMM) with the codebook and the CHMM (Continuous density HMM). The HMM was trained using both the Baum-Welch (Forward-Backward) algorithm and the Viterbi algorithm. The latter was used for emphasizing the state transition probability. For comparison, we also experimented on the identification using the VQ (Vector Quantization) distortion, and the CMDF (Continuous Mixture Density output probability Functions). The results showed that the CHMM identified 4 languages very well (best correct identification rate was 86.3%), and the CHMM had a performance very nice than DHMM (47.6%). Addition, the CHMM had a compareable performance with the CMDF. In particular, the identification between English and Japanese was perfectly performed with the accuracy of more than 95%.
Bibliographic reference. Nakagawa, Seiichi / Ueda, Yoshio / Seino, Takashi (1992): "Speaker-independent, text-independent language identification by HMM", In ICSLP-1992, 1011-1014.