INTERSPEECH 2011
12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Prosodic and Phonetic Features for Speaker Clustering in Speaker Diarization Systems

Janez Žibert (1), France Mihelič (2)

(1) University of Primorska, Slovenia
(2) University of Ljubljana, Slovenia

This work is focused on speaker clustering methods that are used in speaker diarization systems. The purpose of speaker clustering is to associate together segments that belong to the same speaker and is usually applied in the last stage of the speaker-diarization process. We concentrate on developing proper representations of speaker segments for clustering. We realize two different speaker clustering systems. The first is a standard approach using a bottomup agglomerative clustering principle with the Bayesian Information Criterion as a merging criterion. In the second system we developed a fusion-based speaker-clustering, where speaker segments are modeled by acoustic and prosodic representations. In this way we additionally model the speaker prosodic and phonetic characteristics and combine them with the basic acoustic information of speakers. This leads to improved clustering of the segments in the case of similar speaker acoustic properties and poor acoustic conditions.

Full Paper

Bibliographic reference.  Žibert, Janez / Mihelič, France (2011): "Prosodic and phonetic features for speaker clustering in speaker diarization systems", In INTERSPEECH-2011, 1033-1036.