Multi-PLDA Diarization on Children’s Speech

Jiamin Xie, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur

Children’s speech and other vocalizations pose challenges for speaker diarization. The spontaneity of kids causes rapid or delayed phonetic variations in an utterance, which makes speaker’s information difficult to extract. Fast speaker turns and long overlap in conversations between children and their guardians makes correct segmentation even harder compared to, say a business meeting. In this work, we explore diarization of child-guardian interactions. We investigate the effectiveness of adding children’s speech to adult data in Probabilistic Linear Discriminant Analysis (PLDA) training. We also train each of two PLDAs with separate objective to a coarse or fine classification of speakers. A fusion of the two PLDAs is examined. By performing this fusion, we expect to improve on children’s speech while preserving adult segmentations. Our experimental results show that including children’s speech helps reduce DER by 2.7%, achieving a best overall DER of 33.1% with the x-vector system. A fusion system yields a reasonable 33.3% DER that validates our concept.

 DOI: 10.21437/Interspeech.2019-2961

Cite as: Xie, J., García-Perera, L.P., Povey, D., Khudanpur, S. (2019) Multi-PLDA Diarization on Children’s Speech. Proc. Interspeech 2019, 376-380, DOI: 10.21437/Interspeech.2019-2961.

  author={Jiamin Xie and Leibny Paola García-Perera and Daniel Povey and Sanjeev Khudanpur},
  title={{Multi-PLDA Diarization on Children’s Speech}},
  booktitle={Proc. Interspeech 2019},