13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

I-vectors and ILP Clustering Adapted to Cross-show Speaker Diarization

Grégor Dupuy, Mickael Rouvier, Sylvain Meignier, Yannick Estève

LUNAM Université, LIUM, Le Mans, France

We propose to study speaker diarization from a collection of audio documents. The goal is to detect speakers appearing in several shows. In our approach, each show of the collection is processed separately before being processed collectively, to group speakers involved in several shows. Two clustering methods are studied for the overall processing of the collection: one uses the NCLR metric and the other is inspired by techniques based on i-vectors, mainly used in the speaker verification field. Both methods were evaluated on the whole training corpus of ESTER 2. The method based on the use of i-vectors achieves error rates similar to those obtained by the NCLR method, however, the computation time is on average 7.46 times faster. Therefore, this method is suitable f or processing large volumes of data.

Index Terms: speaker diarization, cross-show diarization, i-vectors, ilp clustering.

Full Paper

Bibliographic reference.  Dupuy, Grégor / Rouvier, Mickael / Meignier, Sylvain / Estève, Yannick (2012): "I-vectors and ILP clustering adapted to cross-show speaker diarization", In INTERSPEECH-2012, 2174-2177.