Speaker Representation Learning Using Global Context Guided Channel and Time-Frequency Transformations

Wei Xia, John H.L. Hansen


In this study, we propose the global context guided channel and time-frequency transformations to model the long-range, non-local time-frequency dependencies and channel variances in speaker representations. We use the global context information to enhance important channels and recalibrate salient time-frequency locations by computing the similarity between the global context and local features. The proposed modules, together with a popular ResNet based model, are evaluated on the VoxCeleb1 dataset, which is a large scale speaker verification corpus collected in the wild. This lightweight block can be easily incorporated into a CNN model with little additional computational costs and effectively improves the speaker verification performance compared to the baseline ResNet-LDE model and the Squeeze&Excitation block by a large margin. Detailed ablation studies are also performed to analyze various factors that may impact the performance of the proposed modules. We find that by employing the proposed L2-tf-GTFC transformation block, the Equal Error Rate decreases from 4.56% to 3.07%, a relative 32.68% reduction, and a relative 27.28% improvement in terms of the DCF score. The results indicate that our proposed global context guided transformation modules can efficiently improve the learned speaker representations by achieving time-frequency and channel-wise feature recalibration.


 DOI: 10.21437/Interspeech.2020-1845

Cite as: Xia, W., Hansen, J.H. (2020) Speaker Representation Learning Using Global Context Guided Channel and Time-Frequency Transformations. Proc. Interspeech 2020, 3226-3230, DOI: 10.21437/Interspeech.2020-1845.


@inproceedings{Xia2020,
  author={Wei Xia and John H.L. Hansen},
  title={{Speaker Representation Learning Using Global Context Guided Channel and Time-Frequency Transformations}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3226--3230},
  doi={10.21437/Interspeech.2020-1845},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1845}
}