Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Adaptive Multimodal Fusion by Uncertainty Compensation

Vassilis Pitsikalis, Athanassios Katsamanis, George Papandreou, Petros Maragos

National Technical University of Athens, Greece

While the accuracy of feature measurements heavily depends on changing environmental conditions, studying the consequences of this fact in pattern recognition tasks has received relatively little attention to date. In this work we explicitly take into account feature measurement uncertainty and we show how classification rules should be adjusted to compensate for its effects. Our approach is particularly fruitful in multimodal fusion scenarios, such as audio-visual speech recognition, where multiple streams of complementary time-evolving features are integrated. For such applications, provided that the measurement noise uncertainty for each feature stream can be estimated, the proposed framework leads to highly adaptive multimodal fusion rules which are widely applicable and easy to implement. We further show that previous multimodal fusion methods relying on stream weights fall under our scheme under certain assumptions; this provides novel insights into their applicability for various tasks and suggests new practical ways for estimating the stream weights adaptively. The potential of our approach is demonstrated in audio-visual speech recognition using either synchronous or asynchronous models.

Full Paper

Bibliographic reference.  Pitsikalis, Vassilis / Katsamanis, Athanassios / Papandreou, George / Maragos, Petros (2006): "Adaptive multimodal fusion by uncertainty compensation", In INTERSPEECH-2006, paper 1950-Thu2WeO.2.