Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Tracking and Beamforming for Multiple Simultaneous Speakers with Probabilistic Data Association Filters

Tobias Gehrig, Ulrich Klee, John W. McDonough, Shajith Ikbal, Matthias Wölfel, Christian Fügen

Universitat Karlsruhe, Germany

In prior work, we developed a speaker tracking system based on an extended Kalman filter using time delays of arrival (TDOAs) as acoustic features. While this system functioned well, its utility was limited to scenarios in which a single speaker was to be tracked. In this work, we remove this restriction by generalizing the IEKF, first to a probabilistic data association filter, which incorporates a clutter model for rejection of spurious acoustic events, and then to a joint probabilistic data association filter (JPDAF), which maintains a separate state vector for each active speaker. In a set of experiments conducted on seminar and meeting data, the JPDAF speaker tracking system reduced the multiple object tracking error from 20.7% to 14.3% with respect to the IEKF system. In a set of automatic speech recognition experiments conducted on the output of a 64 channel microphone array which was beamformed using automatic speaker position estimates, applying the JPDAF tracking system reduced word error rate from 67.3% to 66.0%. Moreover, the word error rate on the beamformed output was 13.0% absolute lower than on a single channel of the array.

Full Paper

Bibliographic reference.  Gehrig, Tobias / Klee, Ulrich / McDonough, John W. / Ikbal, Shajith / Wölfel, Matthias / Fügen, Christian (2006): "Tracking and beamforming for multiple simultaneous speakers with probabilistic data association filters", In INTERSPEECH-2006, paper 2038-Thu2FoP.5.