In this paper, we propose a robust voice activity detection method based on long-term stationarity (LTS) of the speech signal. The approach is motivated by the fact that noise, in time-domain, is relatively more stationary as compared to speech. We describe the use of Linear dynamic models (LDMs) as a measure of calculating the long-term stationarity of the signal and propose a voice activity detector by comparing the degree of stationarity at different times in the signal. We evaluate the proposed approach in presence of five types of noises at various SNR levels. Comparison with G.729-Annex B, order statistics filters (OSF) VAD, windowed autocorrelation lag energy (WALE), and autocorrelation zero-crossing rate (AZR) schemes demonstrates that the accuracy of the LTSbased VAD scheme averaged over all noises and all SNRs is 3.94% better than that obtained by the best among the considered VAD schemes.
Bibliographic reference. Mehta, Kannu / Pham, Chau Khoa / Chng, Eng Siong (2011): "Linear dynamic models for voice activity detection", In INTERSPEECH-2011, 2617-2620.