The amount of Web-based multimedia data that includes speech is increasing rapidly. Spoken term detection (STD) enables rapid identification of desired-information candidates from a large quantity of speech data. Considering that these STD candidates ultimately have to be checked one at a time by the user, a long list of candidates is not desirable. However, setting an appropriate cutoff threshold for a particular STD request beforehand is not easy. In this work, we propose a novel indexing and search method for STD that requires no cutoff threshold for detection but can output detection results in increasing order of their dynamic time warping (DTW) distances for a given query term. Our experimental evaluation showed that, whereas using the strict algorithm for our method gave detection results that were exactly in increasing order of their DTW distances, its relaxed variants were able to execute much faster at the cost of only a slight loss in the exact ordering.
Bibliographic reference. Ohno, Teppei / Akiba, Tomoyosi (2013): "DTW-distance-ordered spoken term detection", In INTERSPEECH-2013, 3737-3741.