INTERSPEECH 2006 - ICSLP
Usually, text-dependent speaker verification can achieve better performance than text-independent system because of the constraint that the enrollment and testing utterance share the same phonetic content. However, the enrollment data for text-dependent system usually is very limited. Expectation Maximization(EM) training of HMM will suffer from noisy estimation because of limited enrollment. Adaptation is a popular solution in this scenario. The target model is formed by adapting the generic model based on limited speaker specific training data. Although the adaptation scheme can tolerate much less training data than direct EM method, the traditional method does not account the topology of HMM might be different for different speaker. The topology information further distinguish the target speaker from impostors. In this paper, we propose a unsupervised learning method to learn the topology of HMM for each speaker. The experimental results indicate that with learning the topology, the framework is more effective than traditional adaptation methods. In the pure acoustic matching experiments, the proposed method is the best system under extremely small amount enrollment data (1 training utterance) and moderate training data. That mainly due to explicitly including the label information in background modeling and discriminant capability of unsupervised learning of HMM topology.
Bibliographic reference. Liu, Ming / Huang, Thomas S. (2006): "Unsupervised learning of HMM topology for text-dependent speaker verification", In INTERSPEECH-2006, paper 1302-Tue1CaP.5.