Retrieving information from the ever-increasing amount of unannotated audio and video recordings requires techniques such as unsupervised pattern discovery or query-by-example. In this paper we focus on queries that are specified in the form of an audio snippet containing the desired word or expression excised from the target recordings. The task is to retrieve all-and-only the instances whose match score with the query meet an absolute criterion. For this purpose we introduce a distance measure between two acoustic vectors that can be calibrated in a completely unsupervised manner. The use of that measure also allows the use of a fast matching approach, which makes it possible to skip more than 97% of full-fledged DTW without affecting performance in terms of precision and recall. We demonstrate the effectiveness of the proposals with query-by-example experiments conducted on a read speech corpus for English and a spontaneous speech corpus for Dutch.
Bibliographic reference. Gubian, Michele / Boves, Lou / Versteegh, Maarten (2013): "Calibration of distance measures for unsupervised query-by-example", In INTERSPEECH-2013, 2639-2643.