Phonetically-Aware Coupled Network For Short Duration Text-Independent Speaker Verification

Siqi Zheng, Yun Lei, Hongbin Suo


In this paper we propose an end-to-end phonetically-aware coupled network for short duration speaker verification tasks. Phonetic information is shown to be beneficial for identifying short utterances. A coupled network structure is proposed to exploit phonetic information. The coupled convolutional layers allow the network to provide frame-level supervision based on phonetic representations of the corresponding frames. The end-to-end training scheme using triplet loss function provides direct comparison of speech contents between two utterances and hence enabling phonetic-based normalization. Our systems are compared against the current mainstream speaker verification systems on both NIST SRE and VoxCeleb evaluation datasets. Relative reductions of up to 34% in equal error rate are reported.


 DOI: 10.21437/Interspeech.2020-1306

Cite as: Zheng, S., Lei, Y., Suo, H. (2020) Phonetically-Aware Coupled Network For Short Duration Text-Independent Speaker Verification. Proc. Interspeech 2020, 926-930, DOI: 10.21437/Interspeech.2020-1306.


@inproceedings{Zheng2020,
  author={Siqi Zheng and Yun Lei and Hongbin Suo},
  title={{Phonetically-Aware Coupled Network For Short Duration Text-Independent Speaker Verification}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={926--930},
  doi={10.21437/Interspeech.2020-1306},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1306}
}