Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2

Xueshuai Zhang, Wenchao Wang, Pengyuan Zhang


This paper describes the ASRGroup team speaker diarization systems submitted to the TRACK 2 of the Fearless Steps Challenge Phase-2. In this system, the similarity matrix among all segments of an audio recording was measured by Sequential Bidirectional Long Short-term Memory Networks (Bi-LSTM), and a clustering scheme based on Density Peak Cluster Algorithm (DPCA) was proposed to clustering the segments. The system was compared with the Kaldi Toolkit diarization system (x-vector based on TDNN with PLDA scoring model) and the Spectral system (similarity based on Bi-LSTM with Spectral clustering algorithm). Experiments show that our system is significantly outperforms above systems and achieves a Diarization Error Rate (DER) of 42.75% and 39.52% respectively on the Dev dataset and Eval dataset of TRACK 2 (Fearless Steps Challenge Phase-2). Compared with the baseline Kaldi Toolkit diarization system and Spectral Clustering algorithm with Bi-LSTM similarity models, the DER of our system is absolutely reduced 4.64%, 1.84% and 8.85%, 7.57% respectively on the two datasets.


 DOI: 10.21437/Interspeech.2020-1666

Cite as: Zhang, X., Wang, W., Zhang, P. (2020) Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2. Proc. Interspeech 2020, 2602-2606, DOI: 10.21437/Interspeech.2020-1666.


@inproceedings{Zhang2020,
  author={Xueshuai Zhang and Wenchao Wang and Pengyuan Zhang},
  title={{Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2602--2606},
  doi={10.21437/Interspeech.2020-1666},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1666}
}