Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Speaker Diarization for Multiple Distant Microphone Meetings: Mixing Acoustic Features and Inter-Channel Time Differences

Jose M. Pardo (1,2), Xavier Anguera (1,3), Chuck Wooters (1)

(1) International Computer Science Institute, USA
(2) Universidad Politécnica de Madrid, Spain
(3) Technical University of Catalonia, Barcelona, Spain

Speaker diarization for recordings made in meetings consists of identifying the number of participants in each meeting and creating a list of speech time intervals for each participant. In recently published work [7] we presented some experiments using only TDOA values (Time Delay Of Arrival for different channels) applied to this task. We demonstrated that information in those values can be used to segment the speakers. In this paper we have developed a method to mix the TDOA values with the acoustic values by calculating a combined log-likelihood between both sets of vectors. Using this method we have been able to reduce the DER by 16.34% (relative) for the NIST RT05s set (scored without overlap and manually transcribed references) the DER for our devel06s set (scored with overlap and force-aligned references) by 21% (relative) and the DER for the NIST RT06s (scored with overlap and manually transcribed references) by 15% (relative).

Full Paper

Bibliographic reference.  Pardo, Jose M. / Anguera, Xavier / Wooters, Chuck (2006): "Speaker diarization for multiple distant microphone meetings: mixing acoustic features and inter-channel time differences", In INTERSPEECH-2006, paper 1337-Thu1A1O.5.