Speech Prosody 2008

Campinas, Brazil
May 6-9, 2008

Joint Prosodic and Spectral Modeling for Robust Speaker Verification

Yuan-Fu Liao (1), Wen-Chieh Chang (2), Zong-You Xie (1), Ding-Yun Zeng (1), Yau-Tarng Juang (2)

(1) Department of Electronic Engineering, National Taipei University of Technology, Taiwan
(2) Department of Electrical Engineering, National Central University, Taiwan

In this paper, a joint prosodic and spectral modeling framework is proposed instead of traditional score-domain fusion approaches to alleviate the problem of mismatch channel/handset/ambient noise. The basic idea is to embed the concept of hierarchical structure of speech prosody into an ergodic HMM (EHMM), and model the prosodic status transitions and prosodic/spectral features by EHMMís states, state transition probabilities and state-dependent observation distributions, respectively. Experimental results evaluated on the standard single-speaker detection task of NIST 2001 speaker recognition evaluation (NIST-SRE 2001) showed that the proposed approach not only outperformed the spectral feature-based baseline (8.04% vs. 8.64% in equal error rate, EER) but also worked a little bit better than score-domain fusion ( 8.44%) approach.

Full Paper

Bibliographic reference.  Liao, Yuan-Fu / Chang, Wen-Chieh / Xie, Zong-You / Zeng, Ding-Yun / Juang, Yau-Tarng (2008): "Joint prosodic and spectral modeling for robust speaker verification", In SP-2008, 143-146.