Unsupervised Methods for Audio Classification from Lecture Discussion Recordings

Hang Su, Borislav Dzodzo, Xixin Wu, Xunying Liu, Helen Meng

Time allocated for lecturing and student discussions is an important indicator of classroom quality assessment. Automated classification of lecture and discussion recording segments can serve as an indicator of classroom activity in a flipped classroom setting. Segments of lecture are primarily the speech of the lecturer, while segments of discussion include student speech, silence and noise. Multiple audio recorders simultaneously document all class activities. Recordings are coarsely synchronized to a common start time. We note that the lecturer’s speech tends to be common across recordings, but student discussions are captured only in the nearby device(s). Therefore, we window each recording at 0.5 s to 5 s duration and 0.1 s analysis rate. We compute the normalized similarity between a given window and temporally proximate window segments in other recordings. Histogram plot categorizes higher similarity windows as lecture and lower ones as discussion. To improve the classification performance, high energy lecture windows and windows with very high and very low similarity are used to train a supervised model, in order to regenerate the classification results of remaining windows. Experimental results show that binary classification accuracy improves from 96.84% to 97.37%.

 DOI: 10.21437/Interspeech.2019-2384

Cite as: Su, H., Dzodzo, B., Wu, X., Liu, X., Meng, H. (2019) Unsupervised Methods for Audio Classification from Lecture Discussion Recordings. Proc. Interspeech 2019, 3347-3351, DOI: 10.21437/Interspeech.2019-2384.

  author={Hang Su and Borislav Dzodzo and Xixin Wu and Xunying Liu and Helen Meng},
  title={{Unsupervised Methods for Audio Classification from Lecture Discussion Recordings}},
  booktitle={Proc. Interspeech 2019},