Second International Conference on Spoken Language Processing (ICSLP'92)

Banff, Alberta, Canada
October 13-16, 1992

Speaker Recognition Using Concatenated Phoneme Models

Tomoko Matsui, Sadaoki Furui

NTT Human Interface Laboratories, Tokyo, Japan

This paper investigates a new text-dependent speaker recognition method in which the key texts can be changed every time the recognizer is used and the voice is accepted only when the true speaker utters the prompted text. This method solves the problem in conventional methods which use fixed texts and thus can be defeated by recording the true speaker's voice. Our method is based on the following techniques. First, a text-independent speaker spectral model is created for each speaker as a 1-state 64-mixture-Gaussian hidden Markov model (HMM) with diagonal covariance matrices. Next, the mixture weighting factors are estimated for each phoneme class using training utterances of each speaker. Then the likelihood of the phoneme-class-concatenation model corresponding to the prompted key text is used for the recognition decision. When the proposed method is used, the rejection rate for speech uttered by the true speaker that differs from the key text is as high as 80.7% and the verification rate is 99.9%.

Full Paper

Bibliographic reference.  Matsui, Tomoko / Furui, Sadaoki (1992): "Speaker recognition using concatenated phoneme models", In ICSLP-1992, 603-606.