INTERSPEECH 2006 - ICSLP
This paper describes a low-complexity and efficient speech classifier for noisy environments. The proposed algorithm utilizes the advantage of time-scale analysis of the Wavelet decomposition to classify speech frames into voiced, unvoiced and silence classes. The classifier uses only one single multidimensional feature which is extracted from the Teager energy operator of the wavelet coefficients. The feature is enhanced and compared with quantile-based adaptive thresholds to detect phonetical classes. Furthermore, to save memory, the adaptive thresholds are replaced by a slope tracking method on the filtered feature. These algorithms are tested with the TIMIT database and additive white, car, factory noise, and compared with other methods to demonstrate their superior performance and robustness.
Bibliographic reference. Pham, Tuan Van / Kubin, Gernot (2006): "Low-complexity and efficient classification of voiced/unvoiced/silence for noisy environments", In INTERSPEECH-2006, paper 1400-Thu1A1O.6.