Metric Learning Loss Functions to Reduce Domain Mismatch in the x-Vector Space for Language Recognition

Raphaël Duroselle, Denis Jouvet, Irina Illina


State-of-the-art language recognition systems are based on discriminative embeddings called x-vectors. Channel and gender distortions produce mismatch in such x-vector space where embeddings corresponding to the same language are not grouped in an unique cluster. To control this mismatch, we propose to train the x-vector DNN with metric learning objective functions. Combining a classification loss with the metric learning n-pair loss allows to improve the language recognition performance. Such a system achieves a robustness comparable to a system trained with a domain adaptation loss function but without using the domain information. We also analyze the mismatch due to channel and gender, in comparison to language proximity, in the x-vector space. This is achieved using the Maximum Mean Discrepancy divergence measure between groups of x-vectors. Our analysis shows that using the metric learning loss function reduces gender and channel mismatch in the x-vector space, even for languages only observed on one channel in the train set.


 DOI: 10.21437/Interspeech.2020-1708

Cite as: Duroselle, R., Jouvet, D., Illina, I. (2020) Metric Learning Loss Functions to Reduce Domain Mismatch in the x-Vector Space for Language Recognition. Proc. Interspeech 2020, 447-451, DOI: 10.21437/Interspeech.2020-1708.


@inproceedings{Duroselle2020,
  author={Raphaël Duroselle and Denis Jouvet and Irina Illina},
  title={{Metric Learning Loss Functions to Reduce Domain Mismatch in the x-Vector Space for Language Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={447--451},
  doi={10.21437/Interspeech.2020-1708},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1708}
}