2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM2014)

Penang, Malaysia
September 11-12, 2014

Unsupervised Recognition and Clustering of Speech Overlaps in Spoken Conversations

Shammur Absar Chowdhury, Giuseppe Riccardi, Firoj Alam

Department of Information Engineering and Computer Science, University of Trento, Italy

We are interested in understanding speech overlaps and their function in human conversations. Previous studies on speech overlaps have relied on supervised methods, small corpora and controlled conversations. The characterization of overlaps based on timing, semantic and discourse function requires an analysis over a very large feature space. In this study, the corpus of overlapped speech segments was automatically extracted from human-human spoken conversations using a large vocabulary Automatic Speech Recognizer (ASR) and a turn segmenter. Each overlap instance is automatically projected onto a high dimensional space of acoustic and lexical features. Then, we used unsupervised clustering to find the distinct and well-separated clusters in terms of acoustic and lexical features. We have evaluated recognition and clustering algorithms over a large set of real human-human spoken conversations. The clusters have been comparatively evaluated in terms of feature distributions and their contribution to the automatic classification of the clusters.

Index Terms: Overlapping Speech, Human Conversation, Discourse, Language understanding

Full Paper

Bibliographic reference.  Chowdhury, Shammur Absar / Riccardi, Giuseppe / Alam, Firoj (2014): "Unsupervised recognition and clustering of speech overlaps in spoken conversations", In SLAM-2014, 62-66.