Phonetic, Frame Clustering and Intelligibility Analyses for the INTERSPEECH 2020 ComParE Challenge

Claude Montacié, Marie-José Caraty


The INTERSPEECH 2020 Compare Mask Sub-Challenge is to determine whether a speech signal was emitted with or without wearing a surgical mask. For this purpose, we have investigated phonetic context and intelligibility measurements related to speech changes caused by wearing a mask. Experiments were conducted on the Mask Augsburg Speech Corpus (MASC) and on the Mask Sorbonne Speech Corpus (MSSC) both in German language. We investigated the effects of mask wearing on the acoustical properties of phonemes at frame and segment levels. At the frame level, a phonetic mask detector has been developed to determine the most sensitive phonemes when wearing a mask. At the segmental level, a perceptual scoring of intelligibility has been developed and assessed on the MSCC. Two mask detector systems have been developed and assessed on the MASC: the first one used two large composite audio feature sets, the second one used a bottom-up approach based on phonetic analysis and frame clustering. Experiments have shown an improvement of 5.9% (absolute) on the Test set compared to the official baseline performance of the Challenge (71.8%).


 DOI: 10.21437/Interspeech.2020-2243

Cite as: Montacié, C., Caraty, M. (2020) Phonetic, Frame Clustering and Intelligibility Analyses for the INTERSPEECH 2020 ComParE Challenge. Proc. Interspeech 2020, 2062-2066, DOI: 10.21437/Interspeech.2020-2243.


@inproceedings{Montacié2020,
  author={Claude Montacié and Marie-José Caraty},
  title={{Phonetic, Frame Clustering and Intelligibility Analyses for the INTERSPEECH 2020 ComParE Challenge}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2062--2066},
  doi={10.21437/Interspeech.2020-2243},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2243}
}