ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Exploring methods of improving speaker accuracy for speaker diarization

Mary Tai Knox, Nikki Mirghafori, Gerald Friedland

The focus of this work is to improve the speaker diarization error rate, and more specifically the speaker error rate. We investigate two methods of improving the speaker error rate: modifying the minimum duration constraint and incorporating novel purification techniques. First, in the final step of the speaker diarization algorithm we replace the minimum duration constraint with a simple smoothing algorithm, which averages the log-likelihoods for each of the hypothesized speakers. This method improves the speaker error rate by 12% relative for the MDM condition. Second, we utilize the difference between the largest and second largest log-likelihoods to identify frames which are believed to be correct (or "pure"). The difference value is shown be more effective at separating correct frames from incorrect frames than the previously used maximum log-likelihood value. Using only the "pure" frames, the cluster models are retrained and segmentation is performed using the above mentioned smoothing technique. The proposed purification and smoothing reduces the speaker error rate over the baseline; however, it is worse than performing the smoothing step alone.

doi: 10.21437/Interspeech.2013-637

Cite as: Knox, M.T., Mirghafori, N., Friedland, G. (2013) Exploring methods of improving speaker accuracy for speaker diarization. Proc. Interspeech 2013, 2783-2787, doi: 10.21437/Interspeech.2013-637

  author={Mary Tai Knox and Nikki Mirghafori and Gerald Friedland},
  title={{Exploring methods of improving speaker accuracy for speaker diarization}},
  booktitle={Proc. Interspeech 2013},