EUROSPEECH '95

Both Hidden Markov Models and Neural Networks have already been used as production systems for speaker identification or verification. Recently [9] has shown that ergodic multistate hidden Markov Models do not outperform onestate "hidden" Markov Models, i.e. Gaussian Mixture Models, for speaker recognition. She put in evidence that the important characteristic of these models is the total number of mixtures and not the number of states. These HMMs are thus unable to make use of temporal information for performing speaker recognition. On the other hand, recent experiments have shown that, for neural predictive systems, modelization of non stationarity allowed to significantly improve the performances [6]. We are interested here in the development of such models which will be refereed to as multistate predictive neural networks (MSPNNs). We study the ability of these systems for speaker identification and discuss the superiority of multistate upon onestate models. We provide results on 15 talkers from the TIMIT database.
Bibliographic reference. Artieres, T. / Gallinari, Patrick (1995): "Multistate predictive neural networks for textindependent speaker recognition", In EUROSPEECH1995, 633636.