Adversarial Separation and Adaptation Network for Far-Field Speaker Verification

Lu Yi, Man-Wai Mak


Typically, speaker verification systems are highly optimized on the speech collected by close-talking microphones. However, these systems will perform poorly when the users use far-field microphones during verification. In this paper, we propose an adversarial separation and adaptation network (ADSAN) to extract speaker discriminative and domain-invariant features through adversarial learning. The idea is based on the notion that speaker embedding comprises domain-specific components and domain-shared components, and that the two components can be disentangled by the interplay of the separation network and the adaptation network in the ADSAN. We also propose to incorporate a mutual information neural estimator into the domain adaptation network to retain speaker discriminative information. Experiments on the VOiCES Challenge 2019 demonstrate that the proposed approaches can produce more domain-invariant and speaker discriminative representations, which could help to reduce the domain shift caused by different types of microphones and reverberant environments.


 DOI: 10.21437/Interspeech.2020-2372

Cite as: Yi, L., Mak, M. (2020) Adversarial Separation and Adaptation Network for Far-Field Speaker Verification. Proc. Interspeech 2020, 4298-4302, DOI: 10.21437/Interspeech.2020-2372.


@inproceedings{Yi2020,
  author={Lu Yi and Man-Wai Mak},
  title={{Adversarial Separation and Adaptation Network for Far-Field Speaker Verification}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4298--4302},
  doi={10.21437/Interspeech.2020-2372},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2372}
}