ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Speaker and noise independent voice activity detection

Fran├žois G. Germain, Dennis L. Sun, Gautham J. Mysore

Voice activity detection (VAD) in the presence of heavy, nonstationary noise is a challenging problem that has attracted attention in recent years. Most modern VAD systems require training on highly specialized data: either labeled mixtures of speech and noise that are matched to the application, or, at the very least, noise data similar to that encountered in the application. Because obtaining labeled data can be a laborious task in practical applications, it is desirable for a voice activity detector to be able to perform well in the presence of any type of noise without the need for matched training data. In this paper, we propose a VAD method based on non-negative matrix factorization. We train a universal speech model from a corpus of clean speech but do not train a noise model. Rather, the universal speech model is sufficient to detect the presence of speech in noisy signals. Our experimental results show that our technique is robust to a variety of non-stationary noises mixed at a wide range of signal-to-noise ratios and significantly outperforms baseline algorithms.

doi: 10.21437/Interspeech.2013-204

Cite as: Germain, F.G., Sun, D.L., Mysore, G.J. (2013) Speaker and noise independent voice activity detection. Proc. Interspeech 2013, 732-736, doi: 10.21437/Interspeech.2013-204

  author={Fran├žois G. Germain and Dennis L. Sun and Gautham J. Mysore},
  title={{Speaker and noise independent voice activity detection}},
  booktitle={Proc. Interspeech 2013},