First Workshop on Speech, Language and Audio in Multimedia (SLAM 2013)
Heterogeneous knowledge sources that model speech only at certain time frames are difficult to incorporate into speech recognition, given standard multimodal fusion techniques. In this work, we present a new framework for the integration of this sporadic knowledge into standard HMM-based ASR. In a first step, each knowledge source is mapped onto a logarithmic score by using a sigmoid transfer function. Theses scores are then combined with the standard acoustic models by weighted linear combination. Speech recognition experiments with broad phonetic knowledge sources on a broadcast news transcription task show improved recognition results, given knowledge that provides complementary information for the ASR system.
Index Terms: multimodal fusion, landmark-driven ASR, eventbased speech recognition
Bibliographic reference. Ziegler, Stefan / Gravier, Guillaume (2013): "A framework for integrating heterogeneous sporadic knowledge sources into automatic speech recognition", In SLAM-2013, 37-42.