EUROSPEECH 2001 Scandinavia
Familiar speaker information is explored using non-acoustic features in NIST's new extended data speaker detection task. Word unigrams and bigrams, used in a traditional target/background likelihood ratio framework, are shown to give surprisingly good performance. Performance continues to improve with additional training and/or test data. Bigram performance is also found to be a function of target/model sex and age difference. These initial experiments strongly suggest that further exploration of familiar speaker characteristics will likely be an extremely interesting and valuable research direction for recognition of speakers in conversational speech.
Bibliographic reference. Doddington, George (2001): "Speaker recognition based on idiolectal differences between speakers", In EUROSPEECH-2001, 2521-2524.