ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA2006)

Pittsburgh, PA, USA
September 16, 2006

The Iroquois Model: Using Temporal Dynamics to Separate Speakers

Steven Rennie, Peder Olsen, John Hershey, Trausti Kristjansson

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

We describe a system that can separate and recognize the simultaneous speech of two speakers from a single channel recording and compare the performance of the system to that of human subjects. The system, which we call Iroquois, uses models of dynamics to achieve performance near that of human listeners. However the system exhibits a pattern of performance across conditions that is different from that of human subjects. In conditions where the amplitude of the speakers is similar, the Iroquois model surpasses human performance by over 50%. We hypothesize that the system accomplishes this remarkable feat by employing a different strategy to that of the human auditory system.

Full Paper

Bibliographic reference.  Rennie, Steven / Olsen, Peder / Hershey, John / Kristjansson, Trausti (2006): "The Iroquois model: using temporal dynamics to separate speakers", In SAPA-2006, 24-30.