The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02

Qingjian Lin, Tingle Li, Ming Li


This paper describes the systems developed by the DKU team for the Fearless Steps Challenge Phase-02 competition. For the Speech Activity Detection task, we start with the Long Short-Term Memory (LSTM) system and then apply the ResNet-LSTM improvement. Our ResNet-LSTM system reduces the DCF error by about 38% relatively in comparison with the LSTM baseline. We also discuss the system performance with additional training corpora included, and the lowest DCF of 1.406% on the Eval Set is gained with system pre-training. As for the Speaker Identification task, we employ the Deep ResNet vector system, which receives a variable-length feature sequence and directly generates speaker posteriors. The pretraining process with Voxceleb is also considered, and our best-performing system achieves the Top-5 accuracy of 92.393% on the Eval Set.


 DOI: 10.21437/Interspeech.2020-1915

Cite as: Lin, Q., Li, T., Li, M. (2020) The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02. Proc. Interspeech 2020, 2607-2611, DOI: 10.21437/Interspeech.2020-1915.


@inproceedings{Lin2020,
  author={Qingjian Lin and Tingle Li and Ming Li},
  title={{The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2607--2611},
  doi={10.21437/Interspeech.2020-1915},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1915}
}