12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Linear Dynamic Models for Voice Activity Detection

Kannu Mehta (1), Chau Khoa Pham (2), Eng Siong Chng (2)

(1) IIT Roorkee, India
(2) Nanyang Technological University, Singapore

In this paper, we propose a robust voice activity detection method based on long-term stationarity (LTS) of the speech signal. The approach is motivated by the fact that noise, in time-domain, is relatively more stationary as compared to speech. We describe the use of Linear dynamic models (LDMs) as a measure of calculating the long-term stationarity of the signal and propose a voice activity detector by comparing the degree of stationarity at different times in the signal. We evaluate the proposed approach in presence of five types of noises at various SNR levels. Comparison with G.729-Annex B, order statistics filters (OSF) VAD, windowed autocorrelation lag energy (WALE), and autocorrelation zero-crossing rate (AZR) schemes demonstrates that the accuracy of the LTSbased VAD scheme averaged over all noises and all SNRs is 3.94% better than that obtained by the best among the considered VAD schemes.

Full Paper

Bibliographic reference.  Mehta, Kannu / Pham, Chau Khoa / Chng, Eng Siong (2011): "Linear dynamic models for voice activity detection", In INTERSPEECH-2011, 2617-2620.