Deep Hashing for Speaker Identification and Retrieval

Lei Fan, Qing-Yuan Jiang, Ya-Qi Yu, Wu-Jun Li

Speaker identification and retrieval have been widely used in real applications. To overcome the inefficiency problem caused by real-valued representations, there have appeared some speaker hashing methods for speaker identification and retrieval by learning binary codes as representations. However, these hashing methods are based on i-vector and cannot achieve satisfactory retrieval accuracy as they cannot learn discriminative feature representations. In this paper, we propose a novel deep hashing method, called deep additive margin hashing (DAMH), to improve retrieval performance for speaker identification and retrieval task. Compared with existing speaker hashing methods, DAMH can perform feature learning and binary code learning seamlessly by incorporating these two procedures into an end-to-end architecture. Experimental results on a large-scale audio dataset VoxCeleb2 show that DAMH can outperform existing speaker hashing methods to achieve state-of-the-art performance.

 DOI: 10.21437/Interspeech.2019-2457

Cite as: Fan, L., Jiang, Q., Yu, Y., Li, W. (2019) Deep Hashing for Speaker Identification and Retrieval. Proc. Interspeech 2019, 2908-2912, DOI: 10.21437/Interspeech.2019-2457.

  author={Lei Fan and Qing-Yuan Jiang and Ya-Qi Yu and Wu-Jun Li},
  title={{Deep Hashing for Speaker Identification and Retrieval}},
  booktitle={Proc. Interspeech 2019},