13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Event-based Video Retrieval Using Audio

Qin Jin, Peter Schulam, Shourabh Rawat, Susanne Burger, Duo Ding, Florian Metze

Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA

Multimedia Event Detection (MED) is an annual task in the NIST TRECVID evaluation, and requires participants to build indexing and retrieval systems for locating videos in which certain predefined events are shown. Typical systems focus heavily on the use of visual data. Audio data, however, also contains rich information that can be effectively used for video retrieval, and MED could benefit from the attention of researchers in audio analysis. We present several systems for performing MED using only audio data, report the results of each system on the TRECVID MED 2011 development dataset, and compare the strengths and weaknesses of each approach.

Index Terms: multimedia event detection, audio processing, video retrieval

Full Paper

Bibliographic reference.  Jin, Qin / Schulam, Peter / Rawat, Shourabh / Burger, Susanne / Ding, Duo / Metze, Florian (2012): "Event-based video retrieval using audio", In INTERSPEECH-2012, 2085-2088.