Auditory-Visual Speech Processing (AVSP) 2010
Hakone, Kanagawa, Japan
In this paper, we propose a multi-modal voice activity detection system (VAD) that uses audio and visual information. In multi-modal (speech) signal processing, there are two methods for fusing the audio and the visual information: concatenating the audio and visual features, and employing audioonly and visual-only classi 2;ers, then fusing the unimodal decisions. We investigate the effectiveness of decision fusion given by the results from AdaBoost. AdaBoost is one of the machine learning method. By using AdaBoost, the effective classi 2;er is constructed by combining weak classi 2;ers. It classi 2;es input data into two classes based on the weighted results from weak classi 2;ers. In proposed method, this fusion scheme is applied to decision fusion of multi-modal VAD. Experimental results show proposed method to generally be more effective.
Index Terms: voice activity detection, VAD, multi-modal
Bibliographic reference. Takeuchi, Shin'ichi / Hashiba, Takashi / Tamura, Satoshi / Hayamizu, Satoru (2010): "Decision fusion by boosting method for multi-modal voice activity detection", In AVSP-2010, paper S1-4.