ITRW on Experimental Linguistics
In this paper we propose a new segmentation technique called ISI or "Interlaced Speech Indexing", developed and implemented for the task of broadcast news indexing. It consists in finding the identity of a well-defined speaker and the moments of his interventions inside an audio document, in order to access rapidly, directly and easily to his speech and then to his talk. Our segmentation procedure is based on an interlaced equidistant segmentation (IES) associated with our new ISI algorithm. This approach uses a speaker identification method based on Second Order Statistical Measures. As SOSM measures, we choose the "µGc" one, which is based on the covariance matrix. However, experiments showed that this method needs, at least, a speech length of 2 seconds, which means that the segmentation resolution will be 2 seconds. By combining the SOSM with the new Indexing technique (ISI), we demonstrate that the average segmentation error is reduced to only 0.5 second, which is more accurate and more interesting for real-time applications. Results indicate that this association provides a high resolution and a high tracking performance: the indexing score (percentage of correctly labelled segments) is 95% on TIMIT database and 92.4% on Hub4 Broadcast news 96 database.
Bibliographic reference. Ouamour, S. / Guerti, M. / Sayoud, H. (2006): "Speaker based segmentation on broadcast news - on the use of ISI technique", In ExLing-2006, 193-196.