Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

A Spectral Clustering Approach to Speaker Diarization

Huazhong Ning, Ming Liu, Hao Tang, Thomas S. Huang

University of Illinois at Urbana-Champaign, USA

In this paper, we present a spectral clustering approach to explore the possibility of discovering structure from audio data. To apply the Ng-Jordan-Weiss (NJW) spectral clustering algorithm to speaker diarization, we propose some domain specific solutions to the open issues of this algorithm: choice of metric; selection of scaling parameter; estimation of the number of clusters. Then, a postprocessing step - "Cross EM refinement" - is conducted to further improve the performance of spectral learning. In experiments, this approach has performance very similar to the traditional hierarchical clustering on the audio data of Japanese Parliament Panel Discussions, but it runs much faster than the latter.

Full Paper

Bibliographic reference.  Ning, Huazhong / Liu, Ming / Tang, Hao / Huang, Thomas S. (2006): "A spectral clustering approach to speaker diarization", In INTERSPEECH-2006, paper 1607-Thu1A1O.1.