13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Convolutive Non-Negative Sparse Coding and New Features for Speech Overlap Handling in Speaker Diarization

Jürgen T. Geiger (1), Ravichander Vipperla (2), Simon Bozonnet (2), Nicholas Evans (2), Björn Schuller (1), Gerhard Rigoll (1)

(1) Institute for Human-Machine Communication, Technische Universität München, Germany
(2) Multimedia Communications Department, EURECOM, Sophia Antipolis, France

The effective handling of overlapping speech is at the limits of the current state-of-the-art in speaker diarization. This paper presents our latest work in overlap detection. We report the combination of features derived through convolutive non-negative sparse coding and new energy, spectral and voicing-related features within a conventional HMM system. Overlap detection results are fully integrated into our topdown diarization system through the application of overlap exclusion and overlap labelling. Experiments on a subset of the AMI corpus show that the new system delivers significant reductions in missed speech and speaker error. Through overlap exclusion and labelling the overall diarization error rate is shown to improve by 6.4% relative.

Index Terms: speech overlap detection, convolutive nonnegative sparse coding, speaker diarization

Full Paper

Bibliographic reference.  Geiger, Jürgen T. / Vipperla, Ravichander / Bozonnet, Simon / Evans, Nicholas / Schuller, Björn / Rigoll, Gerhard (2012): "Convolutive non-negative sparse coding and new features for speech overlap handling in speaker diarization", In INTERSPEECH-2012, 2154-2157.