First Workshop on Speech, Language and Audio in Multimedia (SLAM 2013)
In this paper, we present the results obtained by a state-ofthe- art system for Speaker Role Recognition (SRR) on the TV broadcast documents issued from the REPERE Multimedia Challenge. This SRR system is based on the assumption that cues about speaker roles may be extracted from a set of 36 low level features issued from the outputs of a Speaker Diarization process. Starting from manually annotated speaker segments, we first evaluate the performance of the SRR system, formerly evaluated on Broadcast radio recordings, on this heterogeneous set of TV shows. Consequently, we propose a new classification strategy, by observing how building show-dependent models improves SRR. The system is then applied on some speaker segmentation outputs issued from an automatic system, enabling us to investigate the influence of the errors introduced by this front-end process on Role Recognition. In these different contexts, the system is able to correctly classify 86.9% of speaker roles while being applied on manual speaker segmentations and 74.5% on automatic Speaker Diarization outputs.
Index Terms: speaker role recognition, speech processing, content-based indexing of audiovisual documents.
Bibliographic reference. Bigot, Benjamin / Fredouille, Corinne / Charlet, Delphine (2013): "Speaker role recognition on TV broadcast documents", In SLAM-2013, 66-71.