Language Recognition Using Triplet Neural Networks

Victoria Mingote, Diego Castan, Mitchell McLaren, Mahesh Kumar Nandwana, Alfonso Ortega, Eduardo Lleida, Antonio Miguel

In this paper, we propose a novel neural network back-end approach based on triplets for the language recognition task, due to its success application in the related field of text-dependent speaker verification. A triplet is a training example constructed of three audio samples; two from the same class and one from a different class. In presenting two pairs of samples to the network, the triplet neural network learns to discriminate between samples from the same languages and pairs of different languages. Triplet-based training optimizes the Area Under the Curve (AUC) in contrast to other triplet loss functions proposed in the literature. The optimization of the AUC as cost function is appropriate for a detection task as it directly correlates with end-use performance of the system. Moreover, we show the importance of defining an appropriate method of triplet selection and how this impacts performance of the system. When benchmarked on the LRE09 database, the new triplet backend demonstrated superior performance compared to traditional back-ends used for language recognition. In addition, we performed an evaluation on the LRE15 and LRE17 databases to check the generalization power of the proposed systems.

 DOI: 10.21437/Interspeech.2019-2437

Cite as: Mingote, V., Castan, D., McLaren, M., Nandwana, M.K., Ortega, A., Lleida, E., Miguel, A. (2019) Language Recognition Using Triplet Neural Networks. Proc. Interspeech 2019, 4025-4029, DOI: 10.21437/Interspeech.2019-2437.

  author={Victoria Mingote and Diego Castan and Mitchell McLaren and Mahesh Kumar Nandwana and Alfonso Ortega and Eduardo Lleida and Antonio Miguel},
  title={{Language Recognition Using Triplet Neural Networks}},
  booktitle={Proc. Interspeech 2019},