12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification

Sourish Chaudhuri, Mark Harvilla, Bhiksha Raj

Carnegie Mellon University, USA

In this paper, we attempt to represent audio as a sequence of acoustic units using unsupervised learning and use them for multi-class classification. We expect the acoustic units to represent sounds or sound sequences to automatically create a sound alphabet. We use audio from multi-class Youtube-quality multimedia data to converge on a set of sound units, such that each audio file is represented as a sequence of these units. We then try to learn category language models over sequences of the acoustic units, and use them to generate acoustic and language model scores for each category. Finally, we use a margin based classification algorithm to weight the category scores to predict the class that each test data point belongs to. We compare different settings and report encouraging results on this task.

Full Paper

Bibliographic reference.  Chaudhuri, Sourish / Harvilla, Mark / Raj, Bhiksha (2011): "Unsupervised learning of acoustic unit descriptors for audio content representation and classification", In INTERSPEECH-2011, 2265-2268.