INTERSPEECH 2006 - ICSLP
Speaker diarization for recordings made in meetings consists of identifying the number of participants in each meeting and creating a list of speech time intervals for each participant. In recently published work  we presented some experiments using only TDOA values (Time Delay Of Arrival for different channels) applied to this task. We demonstrated that information in those values can be used to segment the speakers. In this paper we have developed a method to mix the TDOA values with the acoustic values by calculating a combined log-likelihood between both sets of vectors. Using this method we have been able to reduce the DER by 16.34% (relative) for the NIST RT05s set (scored without overlap and manually transcribed references) the DER for our devel06s set (scored with overlap and force-aligned references) by 21% (relative) and the DER for the NIST RT06s (scored with overlap and manually transcribed references) by 15% (relative).
Bibliographic reference. Pardo, Jose M. / Anguera, Xavier / Wooters, Chuck (2006): "Speaker diarization for multiple distant microphone meetings: mixing acoustic features and inter-channel time differences", In INTERSPEECH-2006, paper 1337-Thu1A1O.5.