ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA2006)

Pittsburgh, PA, USA
September 16, 2006

Study of Noise Robust Voice Activity Detection Based on Periodic Component to Aperiodic Component Ratio

Kentaro Ishizuka, Tomohiro Nakatani

NTT Communication Science Laboratories, NTT Corporation, Keihanna Science City, Kyoto, Japan

This paper describes a study of noise robust voice activity detection (VAD) utilizing the periodic component to aperiodic component ratio (PAR). Although environmental sound changes dynamically in the real world, conventional noise robust features for VAD are sensitive to the non-stationarity of noise, which yields variations in the signal to noise ratio, and sometimes requires apriori noise power estimations. To overcome this problem, we adopt the PAR as an acoustic feature for VAD that is insensitive to the non-stationarity of noise. Hearing research also suggests that the decomposition of the periodic and aperiodic components plays an important role in the human auditory system. The proposed method first estimates the PAR of the observed signals with a harmonic filter in the frequency region. Then it detects the presence of target speech signals based on the voice activity likelihood defined in relation to the PAR. The performance of the proposed VAD algorithm was examined by using simulated and real noisy speech data. Comparisons confirmed that the proposed VAD algorithm outperforms the conventional VAD algorithms particularly in the presence of non-stationary noise.

