INTERSPEECH 2011
12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Effective Triphone Mapping for Acoustic Modeling in Speech Recognition

Sakhia Darjaa, Miloš Cerňak, Marián Trnka, Milan Rusko, Róbert Sabo

Slovak Academy of Sciences, Slovak Republic

This paper presents effective triphone mapping for acoustic models training in automatic speech recognition, which allows the synthesis of unseen triphones. The description of this data-driven model clustering, including experiments performed using 350 hours of a Slovak audio database of mixed read and spontaneous speech, are presented. The proposed technique is compared with tree-based state tying, and it is shown that for bigger acoustic models, at a size of 4000 states and more, a triphone mapped HMM system achieves better performance than a tree-based state tying system. The main gain in performance is due to latent application of triphone mapping on monophones with multiple Gaussian pdfs, so the cloned triphones are initialized better than with single Gaussians monophones. Absolute decrease of word error rate was 0.46% (5.73% relatively) for models with 7500 states, and decreased to 0.4% (5.17% relatively) gain at 11500 states.

Full Paper

Bibliographic reference.  Darjaa, Sakhia / Cerňak, Miloš / Trnka, Marián / Rusko, Milan / Sabo, Róbert (2011): "Effective triphone mapping for acoustic modeling in speech recognition", In INTERSPEECH-2011, 1717-1720.