Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings

Shane Settle, Keith Levin, Herman Kamper, Karen Livescu


Query-by-example search often uses dynamic time warping (DTW) for comparing queries and proposed matching segments. Recent work has shown that comparing speech segments by representing them as fixed-dimensional vectors — acoustic word embeddings — and measuring their vector distance (e.g., cosine distance) can discriminate between words more accurately than DTW-based approaches. We consider an approach to query-by-example search that embeds both the query and database segments according to a neural model, followed by nearest-neighbor search to find the matching segments. Earlier work on embedding-based query-by-example, using template-based acoustic word embeddings, achieved competitive performance. We find that our embeddings, based on recurrent neural networks trained to optimize word discrimination, achieve substantial improvements in performance and run-time efficiency over the previous approaches.


 DOI: 10.21437/Interspeech.2017-1592

Cite as: Settle, S., Levin, K., Kamper, H., Livescu, K. (2017) Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings. Proc. Interspeech 2017, 2874-2878, DOI: 10.21437/Interspeech.2017-1592.


@inproceedings{Settle2017,
  author={Shane Settle and Keith Levin and Herman Kamper and Karen Livescu},
  title={Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2874--2878},
  doi={10.21437/Interspeech.2017-1592},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1592}
}