Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Evaluation of Voice Activity Detection by Combining Multiple Features with Weight Adaptation

Yusuke Kida, Tatsuya Kawahara

Kyoto University, Japan

For noise-robust automatic speech recognition (ASR), we propose a novel voice activity detection (VAD) method based on a combination of multiple features. The scheme uses a weighted combination of four conventional VAD features: amplitude level, zero crossing rate, spectral information, and Gaussian mixture model (GMM) likelihood. The weights for combination are adaptively updated using minimum classification error (MCE) training. In this paper, we first investigate the effect of adaptation of the combination weights and GMM parameters, and demonstrate that the weights can be effectively adapted with a single utterance. Then, we present application of the method to ASR. It is confirmed that the proposed method significantly outperforms conventional methods in various noise conditions.

Full Paper

Bibliographic reference.  Kida, Yusuke / Kawahara, Tatsuya (2006): "Evaluation of voice activity detection by combining multiple features with weight adaptation", In INTERSPEECH-2006, paper 1152-Wed3A1O.4.