EUROSPEECH 2001 Scandinavia
The paper describes our ongoing work on crosslingual speech recognition based on multilingual triphone hidden Markov models. Multilingual acoustic models were built using two different clustering procedures: agglomerative triphone clustering and tree-based triphone clustering. The agglomerative clustering procedure is based on measuring the similarity of triphones on a phoneme level where the monophone similarity is estimated by the Houtgast algorithm. The treebased clustering procedure is based on common broad classes. The Slovenian, German and Spanish 1000 FDB SpeechDat(II) databases were used for training. The crosslingual speech recognition was performed on the Norwegian 1000 FDB SpeechDat(II) database. No adaptation or training with the Norwegian database was used. The mapping of Norwegian phonemes was done with the IPA scheme. Five different Norwegian recognition vocabularies were generated. The best crosslingual system achieved a recognition rate of 45.03%, while the reference Norwegian system achieved 78.32%.
Bibliographic reference. Zgank, Andrej / Imperl, Bojan / Johansen, Finn Tore / Kacic, Zdravko / Horvat, Bogomir (2001): "Crosslingual speech recognition with multilingual acoustic models based on agglomerative and tree-based triphone clustering", In EUROSPEECH-2001, 2725-2729.