First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

Experiments in Automatic Talker Verification Using Sub-Word Unit Hidden Markov Models

Aaron E. Rosenberg, Chin-Hui Lee, Frank K. Soong, Maureen A. McGee

Speech Research Department, AT&T Bell Laboratories, Murray Hill, NJ, USA

A talker verification system based on characterizing talker utterances as sequences of sub-word units represented by Hidden Markov Models (HMM's) has been implemented and tested. Two types of subword units have been studied, phone-like units (PLU's) and acoustic segment units (ASU's). PLU's are based on phonetic transcriptions of spoken utterances and ASU's are extracted directly from the acoustic signal without use of any linguistic knowledge. The ASU representation has the advantage of not requiring transcriptions of training utterances. Verification performance has been evaluated on a 20-talker database of isolated digit utterances and a 20-talker database of continously spoken sentences drawn from a 1000-word vocabulary. In the isolated digit experiments the verification equal-error rate is approximately 7 to 8% for 1-digit test utterances (approximately 0.5 sec in duration) and 1% or less for 7-digit test utterances (approximately 3.5 sec in duration) with only small differences in performance between PLU- and ASU-based representations. In the continuously spoken sentences experiments using ASU's the best verification performance is 1.7% equal-error rate for 5 second test trials. This is obtained using 64 ASU models trained from 90 seconds of speech. In addition, a technique for updating models, using data from current test utterances, has been devised and implemented. Using this adaptation technique for isolated digits, the error rate falls to 6% for 1-digit utterances and less than 0.5% for 7-digit utterances. The experiments show that excellent verification performance can be obtained with sub-word units represented by HMM's. The techniques can be readily expended from small vocabularies and isolated words to large vocabularies and connected sentences.

