13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Speaker-Dependent Voice Activity Detection Robust to Background Speech Noise

Shigeki Matsuda (1), Naoya Ito (2), Kosuke Tsujino (3), Hideki Kashioka (1), Shigeki Sagayama (2)

(1) Spoken Language Communication Laboratory, National Institute of Information and Communication Technology (NICT), Kyoto, Japan
(2) Graduate School of Information Science and Technology, University of Tokyo, Japan
(3) Research laboratories, NTT DOCOMO Inc., Japan

In this paper, we proposed a speaker-dependent VAD algorithm that extract speech period uttered by a target user only. Based on our survey on recognition error of a real speech data collected in "VoiceTra" that is a speech-to-speech translation system for smart phones, we found a lot of word insertion errors caused by background speakers' speech. Our VAD that consists of the three GMMs (noise GMM and speech GMM as used in traditional GMM-based VAD, and speaker adapted GMM) can be easily used for speech detection of the target speaker. Experiments using test utterances with background speakers' speech demonstrated that an ASR system using our proposed VAD achieved better ASR performance compared with an ASR system using the conventional VAD.

Index Terms: voice activity detection, speech recognition

Full Paper

Bibliographic reference.  Matsuda, Shigeki / Ito, Naoya / Tsujino, Kosuke / Kashioka, Hideki / Sagayama, Shigeki (2012): "Speaker-dependent voice activity detection robust to background speech noise", In INTERSPEECH-2012, 2626-2629.