Siamese X-Vector Reconstruction for Domain Adapted Speaker Recognition

Shai Rozenberg, Hagai Aronowitz, Ron Hoory

With the rise of voice-activated applications, the need for speaker recognition is rapidly increasing. The x-vector, an embedding approach based on a deep neural network (DNN), is considered the state-of-the-art when proper end-to-end training is not feasible. However, the accuracy significantly decreases when recording conditions (noise, sample rate, etc.) are mismatched, either between the x-vector training data and the target data or between enrollment and test data. We introduce the Siamese x-vector Reconstruction (SVR) for domain adaptation. We reconstruct the embedding of a higher quality signal from a lower quality counterpart using a lean auxiliary Siamese DNN. We evaluate our method on several mismatch scenarios and demonstrate significant improvement over the baseline.

 DOI: 10.21437/Interspeech.2020-1742

Cite as: Rozenberg, S., Aronowitz, H., Hoory, R. (2020) Siamese X-Vector Reconstruction for Domain Adapted Speaker Recognition. Proc. Interspeech 2020, 1526-1529, DOI: 10.21437/Interspeech.2020-1742.

  author={Shai Rozenberg and Hagai Aronowitz and Ron Hoory},
  title={{Siamese X-Vector Reconstruction for Domain Adapted Speaker Recognition}},
  booktitle={Proc. Interspeech 2020},