13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Unconstrained Speech Separation by Composition of Longest Segments

Ming Ji, Ramji Srinivasan, Danny Crookes

Institute of Electronics, Communications and Information Technology, Queen's University Belfast, Belfast, UK

A data-driven approach is presented for improving the performance of separating single-channel mixed speech signals, assuming unknown, arbitrary temporal dynamics. The new approach seeks and separates the longest mixed speech segments which can be accurately matched by composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the matching constituent training segments, and hence the error of separation. Experiments are conducted on the Wall Street Journal database, for separating mixtures of largevocabulary speech utterances. The results are evaluated using various objective and subjective measures, including the challenge of largevocabulary continuous speech recognition. It is shown that the new separation approach leads to significant improvement in all these measures.

Index Terms: Temporal dynamics, longest matching segment, speech separation, speech recognition

Full Paper

Bibliographic reference.  Ji, Ming / Srinivasan, Ramji / Crookes, Danny (2012): "Unconstrained speech separation by composition of longest segments", In INTERSPEECH-2012, 1540-1543.