2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM2014)
In this paper an approach minimizing the human involvement
in the manual annotation of speakers is presented. At each iteration
a selection strategy choses the most suitable speech track
for manual annotation, which is then associated with all the
tracks in the cluster that contains it. The study makes use of
a system that propagates the speaker track labels. This is done
using a agglomerative clustering with constraints. Several different
unsupervised active learning selection strategies are evaluated.
Additionally, the presented approach can be used to efficiently generate sets of speech tracks for training biometric models. In this case both the length of the speech track for a given person and its purity are taken into consideration.
To evaluate the system the REPERE video corpus was used. Along with the speech tracks extracted from the videos, the optical character recognition system was adapted to extract names of potential speakers. This was then used as the cold start for the selection method.
Index Terms: active learning, annotation propagation, clustering, speaker identification
Bibliographic reference. Budnik, Mateusz / Poignant, Johann / Besacier, Laurent / Quénot, Georges (2014): "Active selection with label propagation for minimizing human effort in speaker annotation of TV shows", In SLAM-2014, 43-47.