EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology
2nd INTERSPEECH Event

Aalborg, Denmark
September 3-7, 2001

                 

Comparing Parameter Tying Methods for Multilingual Acoustic Modelling

Mikko Harju (1), Petri Salmela (1), Jussi Leppänen (1), Olli Viikki (2), Jukka Saarinen (1)

(1) Tampere University of Technology, Finland
(2) Nokia Research Center, Finland

In this paper, we compare the state-level and model-level tying of continuous density hidden Markov models for the multilingual acoustic modelling. Using the model-level tying technique, the number of the language dependent (LD) phoneme models of five European languages were reduced to the desired number. This tying was based on dissimilarity measure between the LD phoneme models in a bottom-up agglomerative clustering technique. This system provided 87.3% word recognition accuracy on the test set, while a comparable multilingual recognition based on the SAMPA phone inventory obtained 84.6% accuracy on the same set. The above model-level tying technique was also used for obtaining an alternative phone inventory to SAMPA such that both inventories have an equal number of phones for these five languages. The multilingual recognition systems trained for the SAMPA and alternative phone invetonries obtained 80.9% and 83.7% word accuracies on the same test set, when state-level tying was used for reducing the number of the parameters from 199k to 76k in both systems. The original LD recognition systems obtained 89.0% recognition rate with the same test set, which contained approximately 200 isolated words from SpeechDat(II) databases for each of the five languages. In this paper, the test set results are also given for the recognition systems after performing MAP language adaptation for the multilingual phone models.

Full Paper

Bibliographic reference.  Harju, Mikko / Salmela, Petri / Leppänen, Jussi / Viikki, Olli / Saarinen, Jukka (2001): "Comparing parameter tying methods for multilingual acoustic modelling", In EUROSPEECH-2001, 2729-2732.