The SRI Speech-Based Collaborative Learning Corpus

Colleen Richey, Cynthia D’Angelo, Nonye Alozie, Harry Bratt, Elizabeth Shriberg

We introduce the SRI speech-based collaborative learning corpus, a novel collection designed for the investigation and measurement of how students collaborate together in small groups. This is a multi-speaker corpus containing high-quality audio recordings of middle school students working in groups of three to solve mathematical problems. Each student was recorded via a head-mounted noise-cancelling microphone. Each group was also recorded via a stereo microphone placed nearby. A total of 80 sessions were collected with the participation of 134 students. The average duration of a session was 20 minutes. All students spoke English; for some students, English was a second language. Sessions have been annotated with time stamps to indicate which mathematical problem the students were solving and which student was speaking. Sessions have also been hand annotated with common indicators of collaboration for each speaker (e.g., inviting others to contribute, planning) and the overall collaboration quality for each problem. The corpus will be useful to education researchers interested in collaborative learning and to speech researchers interested in children’s speech, speech analytics, and speech diarization. The corpus, both audio and annotation, will be made available to researchers.

DOI: 10.21437/Interspeech.2016-1541

Cite as

Richey, C., D’Angelo, C., Alozie, N., Bratt, H., Shriberg, E. (2016) The SRI Speech-Based Collaborative Learning Corpus. Proc. Interspeech 2016, 1550-1554.

author={Colleen Richey and Cynthia D’Angelo and Nonye Alozie and Harry Bratt and Elizabeth Shriberg},
title={The SRI Speech-Based Collaborative Learning Corpus},
booktitle={Interspeech 2016},