Deep Learning Based Open Set Acoustic Scene Classification

Zuzanna Kwiatkowska, Beniamin Kalinowski, Michał Kośmider, Krzysztof Rykaczewski


In this work, we compare the performance of three selected techniques in open set acoustic scenes classification (ASC). We test thresholding of the softmax output of a deep network classifier, which is the most popular technique nowadays employed in ASC. Further we compare the results with the Openmax classifier which is derived from the computer vision field. As the third model, we use the Adapted Class-Conditioned Autoencoder (Adapted C2AE) which is our variation of another computer vision related technique called C2AE. Adapted C2AE encompasses a more fair comparison of the given experiments and simplifies the original inference procedure, making it more applicable in the real-life scenarios. We also analyse two training scenarios: without additional knowledge of unknown classes and another where a limited subset of examples from the unknown classes is available. We find that the C2AE based method outperforms the thresholding and Openmax, obtaining 85.5% Area Under the Receiver Operating Characteristic curve (AUROC) and 66% of open set accuracy on data used in Detection and Classification of Acoustic Scenes and Events Challenge 2019 Task 1C.


 DOI: 10.21437/Interspeech.2020-3092

Cite as: Kwiatkowska, Z., Kalinowski, B., Kośmider, M., Rykaczewski, K. (2020) Deep Learning Based Open Set Acoustic Scene Classification. Proc. Interspeech 2020, 1216-1220, DOI: 10.21437/Interspeech.2020-3092.


@inproceedings{Kwiatkowska2020,
  author={Zuzanna Kwiatkowska and Beniamin Kalinowski and Michał Kośmider and Krzysztof Rykaczewski},
  title={{Deep Learning Based Open Set Acoustic Scene Classification}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1216--1220},
  doi={10.21437/Interspeech.2020-3092},
  url={http://dx.doi.org/10.21437/Interspeech.2020-3092}
}