First Workshop on Speech, Language and Audio in Multimedia (SLAM 2013)

Marseille, France
August 22-23, 2013

A Framework for Integrating Heterogeneous Sporadic Knowledge Sources into Automatic Speech Recognition

Stefan Ziegler, Guillaume Gravier

CNRS-IRISA, Campus de Beaulieu, Rennes, France

Heterogeneous knowledge sources that model speech only at certain time frames are difficult to incorporate into speech recognition, given standard multimodal fusion techniques. In this work, we present a new framework for the integration of this sporadic knowledge into standard HMM-based ASR. In a first step, each knowledge source is mapped onto a logarithmic score by using a sigmoid transfer function. Theses scores are then combined with the standard acoustic models by weighted linear combination. Speech recognition experiments with broad phonetic knowledge sources on a broadcast news transcription task show improved recognition results, given knowledge that provides complementary information for the ASR system.

Index Terms: multimodal fusion, landmark-driven ASR, eventbased speech recognition

Full Paper

Bibliographic reference.  Ziegler, Stefan / Gravier, Guillaume (2013): "A framework for integrating heterogeneous sporadic knowledge sources into automatic speech recognition", In SLAM-2013, 37-42.