EUROSPEECH '95

This paper investigates improvements to the vector quantisation (VQ) distortion method of textindependent speaker identification, using a conventional codebook of instantaneous cepstral vectors from each speaker's training data, and one secondlevel codebook of transitional cepstral vectors for each codeword of the instantaneous codebook. Results on a 20speaker database of 30 phonetically rich utterances show a reduction of the error rate from 6.5% for a conventional codebook of size 128 to 5.5% for a codebook which contains 16 transitional codewords for each of the 128 instantaneous codewords (128x16). Results on a 20speaker database of spoken digits show a reduction of error rate from 3.1% for a conventional (128xO)codebook to 0.9% for a (128x4)codebook. Alternatively, a constant error rate can be maintained at a reduced number of codeword comparisons using codewordspecific transitional codebooks. Results also show that, given a sufficient size of transitional codebook, transitional distortion scores after instantaneous preclassification can be superior to purely instantaneous distortion scores.
Bibliographic reference. Wagner, Michael / Mason, John S. / Millar, J. Bruce (1995): "Speaker identification using vector quantisation with codewordspecific derivative coding", In EUROSPEECH1995, 383386.