Speech Prosody 2002

Aix-en-Provence, France
April 11-13, 2002

Performance Improvement in Estimating Subjective Agedness with Prosodic Features

Nobuaki Minematsu (1), Mariko Sekiguchi (2), Keikichi Hirose (2)

(1) Graduate School of Information Science and Technology; (2) Graduate School of Frontier Sciences, University of Tokyo, Japan

In this paper, we propose a technique which automatically estimates speakers’ agedness only with acoustic, not linguistic, information of their utterances. This method is realized by integrating GMM(Gaussian Mixture Model)-based speaker recognition techniques with modules for calculating prosody-based agedness scores. We firstly divided speakers of two databases, JNAS and S(senior)-JNAS, into two groups by listening tests. One group has only the speakers whose speech sounds so aged that one should take special care when he/she talks to them. The other group has the remaining speakers of the two databases. After that, each speaker group was modeled with GMM. Experiments of automatic identification of the speaker group showed the correct identification rate of 91%. To improve the performance, two prosodic features were considered, i.e, speech rate and local perturbation of power. Using these features, the identification rate was raised up to 95%. Finally, using scores calculated by integrating the GMM and the prosodic modules, experiments were carried out to automatically estimate speakers’ agedness. The results showed high correlation between speakers’ agedness estimated subjectively by humans and the automatically calculated scores with the proposed method.

Full Paper

Bibliographic reference.  Minematsu, Nobuaki / Sekiguchi, Mariko / Hirose, Keikichi (2002): "Performance improvement in estimating subjective agedness with prosodic features", In SP-2002, 507-510.