A Speaker Recognition System for the SITW Challenge

Oleg Kudashev, Sergey Novoselov, Konstantin Simonchik, Alexandr Kozlov

This paper presents an ITMO university system submitted to the Speakers in the Wild (SITW) Speaker Recognition Challenge. During evaluation track of the SITW challenge we explored conventional universal background model (UBM) Gaussian mixture model (GMM) i-vector systems and recently developed DNN-posteriors based i-vector systems. The systems were investigated under the real-world media channel conditions represented in the challenge. This paper discusses practical issues of the robust i-vector systems training and performs investigation of denoising autoencoder (DAE) based back-end when applied to “in the wild” conditions. Our speak-er diarization approach for “multi-speaker in the file” conditions is also briefly presented in the paper. Experiments per-formed on the evaluation dataset demonstrate that DNN- based i-vector systems are superior to the UBM-GMM based sys-tems and applying DAE-based back-end helps to improve system performance.

DOI: 10.21437/Interspeech.2016-1197

Cite as

Kudashev, O., Novoselov, S., Simonchik, K., Kozlov, A. (2016) A Speaker Recognition System for the SITW Challenge. Proc. Interspeech 2016, 833-837.

author={Oleg Kudashev and Sergey Novoselov and Konstantin Simonchik and Alexandr Kozlov},
title={A Speaker Recognition System for the SITW Challenge},
booktitle={Interspeech 2016},