13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Pooling Robust Shift-Invariant Sparse Representations of Acoustic Signals

Po-Sen Huang (1), Jianchao Yang (1), Mark Hasegawa-Johnson (1), Feng Liang (2), Thomas S. Huang (1)

(1) Department of Electrical and Computer Engineering; (2) Department of Statistics;
University of Illinois at Urbana-Champaign, USA

In recent years, designing the coding and pooling structures in layered networks has been shown to be a useful method for learning highlevel feature representations for visual data. Yet, such learning structures have not been extensively studied for audio signals. In this paper, we investigate the different pooling strategies based on the sparse coding scheme and propose a temporal pyramid pooling method to extract discriminative and shift-invariant feature representations. We demonstrate the superiority of our new feature representation over traditional features on the acoustic event classification task.

Index Terms: sparse coding, pooling, acoustic event classification

Full Paper

Bibliographic reference.  Huang, Po-Sen / Yang, Jianchao / Hasegawa-Johnson, Mark / Liang, Feng / Huang, Thomas S. (2012): "Pooling robust shift-invariant sparse representations of acoustic signals", In INTERSPEECH-2012, 2518-2521.