First Workshop on Speech, Language and Audio in Multimedia (SLAM 2013)
In the inEvent EU project , we aim at structuring, retrieving, and sharing large archives of networked, and dynamically changing, multimedia recordings, mainly consisting of meetings, videoconferences, and lectures. More specifically, we are developing an integrated system that performs audiovisual processing of multimedia recordings, and labels them in terms of interconnected hyper-events (a notion inspired from hyper-texts). Each hyper-event is composed of simpler facets, including audio-video recordings and metadata, which are then easier to search, retrieve and share. In the present paper, we mainly cover the audio processing aspects of the system, including speech recognition, speaker diarization and linking (across recordings), the use of these features for hyper-event indexing and recommendation, and the search portal. We present initial results for feature extraction from lecture recordings using the TED talks.
Index Terms: Networked multimedia events; audio processing: speech recognition; speaker diarization and linking; multimedia indexing and searching; hyper-events.
Bibliographic reference. Bourlard, Hervé / Ferràs, Marc / Pappas, Nikolaos / Popescu-Belis, Andrei / Renals, Steve / McInnes, Fergus / Bell, Peter J. / Ingram, Sandy / Guillemot, Mael (2013): "Processing and linking audio events in large multimedia archives: the EU inevent project", In SLAM-2013, 3-8.