The amount of multimedia data is increasing every day and there is a growing demand for high-accuracy multimedia retrieval systems that go beyond retrieving simple events (e.g., detecting a sport video), to more specific and hard-to-detect events (e.g., a point in a tennis match). To retrieve these complex events, audio content features play an important role since they provide complementary information to image/video features. In this paper, we propose a novel approach where we employ an HMM-based acoustic concept recognition (ACR) system and convert resulting recognition lattices into acoustic concept indexes to represent multimedia audio content. Lattice indexes are created by extracting posterior-weighted N-gram counts from the ACR lattices and they are used as features in SVM-based classification for multimedia event detection (MED) task. We evaluate the proposed approach on the NIST 2011 TRECVID MED development set, which consists of user-generated videos from the internet. Proposed approach yields an Equal Error Rate (EER) of 31.6% on this acoustically challenging dataset (on a set of 5 video events) outperforming previously proposed supervised and unsupervised approaches on the same dataset (34.5% and 36.9% respectively).
Bibliographic reference. Castan, Diego / Akbacak, Murat (2013): "Indexing multimedia documents with acoustic concept recognition lattices", In INTERSPEECH-2013, 2644-2648.