Co-whitening of I-vectors for Short and Long Duration Speaker Verification

Longting Xu, Kong Aik Lee, Haizhou Li, Zhen Yang

An i-vector is a fixed-length and low-rank representation of a speech utterance. It has been used extensively in text-independent speaker verification. Ideally, speech utterances from the same speaker would map to an unique i-vector. However, this is not the case due to some intrinsic and extrinsic factors like physical condition of the speaker, channel difference, noise and notably the duration of speech utterances. In particular, we found that i-vectors extracted from short utterances exhibit larger variance than that of long utterances. To address the problem, we propose a co-whitening approach, taking into account the duration, while maximizing the correlation between the i-vectors of short and long duration. The proposed co-whitening method was derived based on canonical correlation analysis (CCA). Experimental results on NIST SRE 2010 show that co-whitening method is effective in compensating the duration mismatch, leading to a reduction of up to 13.07% in equal error rate (EER).

 DOI: 10.21437/Interspeech.2018-1246

Cite as: Xu, L., Lee, K.A., Li, H., Yang, Z. (2018) Co-whitening of I-vectors for Short and Long Duration Speaker Verification. Proc. Interspeech 2018, 1066-1070, DOI: 10.21437/Interspeech.2018-1246.

  author={Longting Xu and Kong Aik Lee and Haizhou Li and Zhen Yang},
  title={Co-whitening of I-vectors for Short and Long Duration Speaker Verification},
  booktitle={Proc. Interspeech 2018},