Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System

T. Kristjansson, J. Hershey, P. Olsen, S. Rennie, Ramesh Gopinath

IBM T.J. Watson Research Center, USA

We describe a system for model based speech separation which achieves super-human recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with remarkable results. It incorporates a novel method for performing two-talker speaker identification and gain estimation. We extend the method of model based high resolution signal reconstruction to incorporate temporal dynamics. We report on two methods for introducing dynamics; the first uses dynamics in the acoustic model space, the second incorporates dynamics based on sentence grammar. The addition of temporal constraints leads to dramatic improvements in the separation performance. Once the signals have been separated they are then recognized using speaker dependent labeling.

Full Paper

Bibliographic reference.  Kristjansson, T. / Hershey, J. / Olsen, P. / Rennie, S. / Gopinath, Ramesh (2006): "Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system", In INTERSPEECH-2006, paper 1775-Mon1WeS.7.