13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Indexing Raw Acoustic Features for Scalable Zero Resource Search

Aren Jansen, Benjamin Van Durme

Human Language Technology Center of Excellence, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, USA

We present a new speech indexing and search scheme called Randomized Acoustic Indexing and Logarithmic-time Search (RAILS) that enables scalable query-by-example spoken term detection in the zero resource regime. RAILS is derived from our recent investigation into the application of randomized hashing and approximate nearest neighbor search algorithms to raw acoustic features. Our approach permits an approximate search through hundreds of hours of speech audio in a matter of seconds, and may be applied to any language without the need of a training corpus, acoustic model, or pronunciation lexicon. The fidelity of the approximation is controlled through a small number of easily interpretable parameters that allow a trade-off between search accuracy and speed.

Index Terms: speech indexing, zero resource, query-byexample, spoken term detection, locality sensitive hashing

Full Paper

Bibliographic reference.  Jansen, Aren / Durme, Benjamin Van (2012): "Indexing raw acoustic features for scalable zero resource search", In INTERSPEECH-2012, 2466-2469.