INTERSPEECH 2006 - ICSLP
For noise-robust automatic speech recognition (ASR), we propose a novel voice activity detection (VAD) method based on a combination of multiple features. The scheme uses a weighted combination of four conventional VAD features: amplitude level, zero crossing rate, spectral information, and Gaussian mixture model (GMM) likelihood. The weights for combination are adaptively updated using minimum classification error (MCE) training. In this paper, we first investigate the effect of adaptation of the combination weights and GMM parameters, and demonstrate that the weights can be effectively adapted with a single utterance. Then, we present application of the method to ASR. It is confirmed that the proposed method significantly outperforms conventional methods in various noise conditions.
Bibliographic reference. Kida, Yusuke / Kawahara, Tatsuya (2006): "Evaluation of voice activity detection by combining multiple features with weight adaptation", In INTERSPEECH-2006, paper 1152-Wed3A1O.4.