INTERSPEECH 2006 - ICSLP
Discriminating speaking styles is an important issue in speech recognition, speaker recognition and speaker segmentation. This paper compares distance measures between Gaussian distributions for discriminating speaking styles. The Mahalanobis distance, the Bhattacharyya distance and the Kullback-Leibler divergence, which are in common use for a definition as a distance measure between Gaussian distributions, are evaluated in terms of an accuracy to discriminate speaking styles. In this paper, the accuracy is judged on a visualized map, where speaking style speech corpora are mapped onto twodimensional space by utilizing a multidimensional scaling method. It is shown that speaking style clusters appear clearly grouped on the visualized map obtained by the Bhattacharyya distance and the Kullback-Leibler divergence. In addition, the visualized map corresponds to speech recognition performance, and the Kullback-Leibler shows higher sensitivity to recognition performance.
Bibliographic reference. Nagino, Goshu / Shozakai, Makoto (2006): "Distance measure between Gaussian distributions for discriminating speaking styles", In INTERSPEECH-2006, paper 1383-Mon3CaP.6.