Auditory-Visual Speech Processing (AVSP) 2010

Hakone, Kanagawa, Japan
September 30-October 3, 2010

Decision Fusion by Boosting Method for Multi-Modal Voice Activity Detection

Shin'ichi Takeuchi (1), Takashi Hashiba (2), Satoshi Tamura (3), Satoru Hayamizu (3)

(1) R&D Center for Human Medical Engineering, Gifu Univ., Japan
(2) Graduate School of Engineering, Gifu Univ., Japan
(3) Faculty of Engineering, Gifu Univ., Japan

In this paper, we propose a multi-modal voice activity detection system (VAD) that uses audio and visual information. In multi-modal (speech) signal processing, there are two methods for fusing the audio and the visual information: concatenating the audio and visual features, and employing audioonly and visual-only classi&# 2;ers, then fusing the unimodal decisions. We investigate the effectiveness of decision fusion given by the results from AdaBoost. AdaBoost is one of the machine learning method. By using AdaBoost, the effective classi&# 2;er is constructed by combining weak classi&# 2;ers. It classi&# 2;es input data into two classes based on the weighted results from weak classi&# 2;ers. In proposed method, this fusion scheme is applied to decision fusion of multi-modal VAD. Experimental results show proposed method to generally be more effective.

Index Terms: voice activity detection, VAD, multi-modal

Full Paper

Bibliographic reference.  Takeuchi, Shin'ichi / Hashiba, Takashi / Tamura, Satoshi / Hayamizu, Satoru (2010): "Decision fusion by boosting method for multi-modal voice activity detection", In AVSP-2010, paper S1-4.