A novel and robust approach for content based speech/non-speech audio classification is proposed based on sparse representation (SR) features and Gaussian process classifiers (GPCs). The projections of the noise robust sparse representations for audio signals computed by L1-norm minimization are used as features. GPCs are used to learn and predict audio categories. Compare to the difficulties of Support Vector Machines (SVMs) in determining the hyperparameters, GPCs employ Bayesian selection criterion to estimate them. Experimental results on real-world audio datasets show that the SR features are more robust to audio variants than mel-frequency cepstral coefficients (MFCCs) and the proposed approach gives better performances than SVM.
Bibliographic reference. Shi, Ziqiang / Han, Jiqing / Zheng, Tieran (2011): "Real-world speech/non-speech audio classification based on sparse representation features and GPCs", In INTERSPEECH-2011, 2401-2404.