DNN-Based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification

Gábor Gosztolya, Róbert Busa-Fekete, Tamás Grósz, László Tóth


In this study we deal with the three sub-challenges of the Interspeech ComParE Challenge 2017, where the goal is to identify child-directed speech, speakers having a cold, and different types of snoring sounds. For the first two sub-challenges we propose a simple, two-step feature extraction and classification scheme: first we perform frame-level classification via Deep Neural Networks (DNNs), and then we extract utterance-level features from the DNN outputs. By utilizing these features for classification, we were able to match the performance of the standard paralinguistic approach (which involves extracting thousands of features, many of them being completely irrelevant to the actual task). As for the Snoring Sub-Challenge, we divided the recordings into segments, and averaged out some frame-level features segment-wise, which were then used for utterance-level classification. When combining the predictions of the proposed approaches with those got by the standard paralinguistic approach, we managed to outperform the baseline values of the Cold and Snoring sub-challenges on the hidden test sets.


 DOI: 10.21437/Interspeech.2017-905

Cite as: Gosztolya, G., Busa-Fekete, R., Grósz, T., Tóth, L. (2017) DNN-Based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification. Proc. Interspeech 2017, 3522-3526, DOI: 10.21437/Interspeech.2017-905.


@inproceedings{Gosztolya2017,
  author={Gábor Gosztolya and Róbert Busa-Fekete and Tamás Grósz and László Tóth},
  title={DNN-Based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3522--3526},
  doi={10.21437/Interspeech.2017-905},
  url={http://dx.doi.org/10.21437/Interspeech.2017-905}
}