13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Fully Bayesian Speaker Clustering Based on Hierarchically Structured Utterance-oriented Dirichlet Process Mixture Model

Naohiro Tawara (1), Tetsuji Ogawa (1), Shinji Watanabe (2,3), Atsushi Nakamura (2), Tetsunori Kobayashi (1)

(1) Waseda University, Tokyo, Japan
(2) NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan

We proposed a novel Bayesian speaker clustering method based on a nonparametric Bayesian model which has a hierarchical structure. We carried out preliminary speaker clustering experiments with the conventional hierarchical agglomerative clustering based on Bayesian information criterion (AHC-BIC). Experimental result showed that the proposed method was effective to the data in which the number of utterances varied from speaker to speaker, while the conventional method caused significant degradation in clustering accuracy for these data. Index Terms Speaker clustering, nonparametric Bayesian model, Gibbs sampling, utterance-oriented Dirichlet process mixture model.

Full Paper

Bibliographic reference.  Tawara, Naohiro / Ogawa, Tetsuji / Watanabe, Shinji / Nakamura, Atsushi / Kobayashi, Tetsunori (2012): "Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model", In INTERSPEECH-2012, 2166-2169.