Supervised Domain Adaptation for Text-Independent Speaker Verification Using Limited Data

Seyyed Saeed Sarfjoo, Srikanth Madikeri, Petr Motlicek, S├ębastien Marcel


To adapt the speaker verification (SV) system to a target domain with limited data, this paper investigates the transfer learning of the model pre-trained on the source domain data. To that end, layer-by-layer adaptation with transfer learning from the initial and final layers of the pre-trained model is investigated. We show that the model adapted from the initial layers outperforms the model adapted from the final layers. Based on this evidence, and inspired by the works in image recognition field, we hypothesize that low-level convolutional neural network (CNN) layers characterize domain-specific component while high-level CNN layers are domain-independent and have more discriminative power. For adapting these domain-specific components, angular margin softmax (AMSoftmax) applied on the CNN-based implementation of the x-vector architecture. In addition, to reduce the problem of over-fitting on the limited target data, transfer learning on the batch norm layers is investigated. Mean shift and covariance estimation of batch norm allows to map the represented components of the target domain to the source domain. Using TDNN and E-TDNN versions of the x-vectors as baseline models, the adapted models on the development set of NIST SRE 2018 outperformed the baselines with relative improvements of 11.0 and 13.8%, respectively.


 DOI: 10.21437/Interspeech.2020-2342

Cite as: Sarfjoo, S.S., Madikeri, S., Motlicek, P., Marcel, S. (2020) Supervised Domain Adaptation for Text-Independent Speaker Verification Using Limited Data. Proc. Interspeech 2020, 3815-3819, DOI: 10.21437/Interspeech.2020-2342.


@inproceedings{Sarfjoo2020,
  author={Seyyed Saeed Sarfjoo and Srikanth Madikeri and Petr Motlicek and S├ębastien Marcel},
  title={{Supervised Domain Adaptation for Text-Independent Speaker Verification Using Limited Data}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3815--3819},
  doi={10.21437/Interspeech.2020-2342},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2342}
}