EUROSPEECH 2001 Scandinavia
The model presented in this paper consists of a set of subband GMMs trained on speech data corrupted with white Gaussian noise at several SNRs. In the recognition stage, an optimal GMM that yields the maximum accumulated likelihood on the whole input frames is selected for each subband. Then the likelihood is recombined over the subbands to give a speaker identification score. To evaluate the performance of this model, text independent speaker identification experiments were conducted under 5 different noisy environments. For comparison, performance evaluation was also conducted on 3 other models: a subband model trained on clean speech, a multi-SNR fullband model, and a fullband model trained on clean speech. Results show that the multi-SNR subband model is very effective under a wide variety of noisy environments. Additional improvement was observed when an optimal GMM was selected on a short term basis instead of a whole input basis.
Bibliographic reference. Yoshida, Kenichi / Takagi, Kazuyuki / Ozeki, Kazuhiko (2001): "A multi-SNR subband model for speaker identification under noisy environments", In EUROSPEECH-2001, 2849-2852.