EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Towards Combining Pitch and MFCC for Speaker Recognition Systems

Hassan Ezzaidi (1), Jean Rouat (1), Douglas O’Shaughnessy (2)

(1) DSA, ERMETIS, Université du Québec à Chicoutimi, Canada
(2) INRS-Télécommunications, Université du Québec, Canada

Usually, speaker recognition systems do not take into account the dependence between the vocal source and the vocal tract. A feasibility study that retains this dependence is presented here. A model of joint probability functions of the pitch and the feature vectors is proposed. Three strategies are designed and compared for all female speakers taken from the SPIDRE corpus. The first operates on all voiced and unvoiced speech segments (baseline strategy). The second strategy considers only the voiced speech segments and the last includes the pitch information along with thestandard MFCC. We use two pattern recognizers: LVQ--SLP and GMM. In all cases, we observe an increase in the identification rates and more specifically when using a time duration of 500ms (6% higher).

Full Paper

Bibliographic reference.  Ezzaidi, Hassan / Rouat, Jean / O’Shaughnessy, Douglas (2001): "Towards combining pitch and MFCC for speaker recognition systems", In EUROSPEECH-2001, 2825-2828.