EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Crosslingual Speech Recognition with Multilingual Acoustic Models Based on Agglomerative and Tree-Based Triphone Clustering

Andrej Zgank (1), Bojan Imperl (1), Finn Tore Johansen (2), Zdravko Kacic (1), Bogomir Horvat (1)

(1) University of Maribor, Slovenia
(2) Telenor Research and Development, Norway

The paper describes our ongoing work on crosslingual speech recognition based on multilingual triphone hidden Markov models. Multilingual acoustic models were built using two different clustering procedures: agglomerative triphone clustering and tree-based triphone clustering. The agglomerative clustering procedure is based on measuring the similarity of triphones on a phoneme level where the monophone similarity is estimated by the Houtgast algorithm. The treebased clustering procedure is based on common broad classes. The Slovenian, German and Spanish 1000 FDB SpeechDat(II) databases were used for training. The crosslingual speech recognition was performed on the Norwegian 1000 FDB SpeechDat(II) database. No adaptation or training with the Norwegian database was used. The mapping of Norwegian phonemes was done with the IPA scheme. Five different Norwegian recognition vocabularies were generated. The best crosslingual system achieved a recognition rate of 45.03%, while the reference Norwegian system achieved 78.32%.

Full Paper

Bibliographic reference.  Zgank, Andrej / Imperl, Bojan / Johansen, Finn Tore / Kacic, Zdravko / Horvat, Bogomir (2001): "Crosslingual speech recognition with multilingual acoustic models based on agglomerative and tree-based triphone clustering", In EUROSPEECH-2001, 2725-2729.