Speech Prosody 2006

Dresden, Germany
May 2-5, 2006

Exploring Expressive Speech Space in an Audio-book

Lijuan Wang (2), Yong Zhao (1), Min Chu (1), Yining Chen (1), Frank K. Soong (1), Zhigang Cao (2)

(1) Microsoft Research Asia, Beijing, China
(2) Department of Electronic Engineering, Tsinghua University, Beijing, China

In this paper, an audio-book, in which a professional voice talent performs multiple characters, is exploited to investigate the expressiveness of speech. The expressive speech space of the sole speaker is explored by finding the distances between acoustic models of multiple characters and the perceived proximity between their speech utterances. Using the speech of ten characters as test data, the character confusion is evaluated in both acoustic space and perceptual space. We find that the average precision to differentiate one character from the others is 81.7% in the acoustic space and 72.6% in the perceptual space. It is interesting that the objective measure outperforms the subjective measure. Furthermore, the acoustic distance measured by normalized Kullback-Leibler divergence (NKLD) between two characters is highly correlated with the perceptual distance. The correlation coefficient is 0.814. Therefore, NKLD can measure the perceptual similarity between groups of utterances objectively.

Full Paper

Bibliographic reference.  Wang, Lijuan / Zhao, Yong / Chu, Min / Chen, Yining / Soong, Frank K. / Cao, Zhigang (2006): "Exploring expressive speech space in an audio-book", In SP-2006, paper 182.