13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

DiarTk: An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application to Meetings Recordings

Deepu Vijayasenan (1), Fabio Valente (2)

(1) Universität des Saarlandes, Saarbrücken, Germany
(2) Idiap Research Institute, Martigny, Switzerland

The speaker diarization task consists in inferring "who spoke when" in an audio stream without any prior knowledge and has been object of several NIST international evaluation campaigns is last years. A common trend for improving performances has been the use of several different feature streams as diverse as speaker location features, visual features or noise robust acoustic features. This paper describes an open source toolkit released under GPL license aiming at facilitating research in multistream speaker diarization and reproducing state-of-the-art results. In contrary to other related diarization toolkits, it is explicitly designed to handle an arbitrary number of features with very different statistics while limiting the computational complexity. The release includes a set of recipes scripts to replicate benchmark results on previous NIST evaluations and is intended to provide an easy to use software to study and include novel features into diarization systems.

Index Terms: Open Source toolkit, Speaker Diarization, multistream features, NIST Rich Transcription

Full Paper

Bibliographic reference.  Vijayasenan, Deepu / Valente, Fabio (2012): "Diartk: an open source toolkit for research in multistream speaker diarization and its application to meetings recordings", In INTERSPEECH-2012, 2170-2173.