Auditory-Visual Speech Processing (AVSP) 2010

Hakone, Kanagawa, Japan
September 30-October 3, 2010

Audio-Visual Speech Recognition System for a Robot

Takami Yoshida (1), Kazuhiro Nakadai (1,2)

(1) Graduate School of Information Science and Engineering, Tokyo Institute of Technology
(2) Honda Research Institute Japan

Automatic Speech Recognition (ASR) for a robot should be robust for noises because a robot works in noisy environments. Audio-Visual (AV) integration is one of the key ideas to improve its robustness in such environments. This paper proposes AV integration for an ASR system for a robot which applies AV integration to Voice Activity Detection (VAD) and speech decoding. In VAD, we apply AV-integration based on a Bayesian network and in speech decoding, we apply AV-integration based on stream weights. We implemented a pro- totype AV-ASR system based on our proposed method and evaluated the system in several conditions. Preliminary results showed that the proposed system improves the robustness of ASR even in auditorily- or visually-contaminated situations.

Index Terms: audio-visual integration, speech recognition, voice activity detection

Full Paper

Bibliographic reference.  Yoshida, Takami / Nakadai, Kazuhiro (2010): "Audio-visual speech recognition system for a robot", In AVSP-2010, paper S1-2.