Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

BINSEG: An Efficient Speaker-Based Segmentation Technique

Jindrich Zdansky

Technical University of Liberec, Czech Republic

In this paper we present a new efficient approach to speaker-based audio stream segmentation. It employs binary segmentation technique that is well-known from mathematical statistic. Because integral part of this technique is hypotheses testing, we compare two well-founded (Maximum Likelihood, Informational) and one commonly used (BIC difference) approach for deriving speaker-change test statistics. Based on results of this comparison we propose both off-line and on-line speaker change detection algorithms (including way of effective training) that have merits of high accuracy and low computational costs. In simulated tests with artificially mixed data the on-line algorithm identified 95.7% of all speaker changes with precision of 96.9%. In tests done with 30 hours of real broadcast news (in 9 languages) the average recall was 74.4% and precision 70.3%.

Full Paper

Bibliographic reference.  Zdansky, Jindrich (2006): "BINSEG: an efficient speaker-based segmentation technique", In INTERSPEECH-2006, paper 1459-Thu1A1O.2.