4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
This paper presents a comparison of different model architectures for TIMIT phoneme recognition. The baseline is a conventional diagonal covariance Gaussian mixture HMM. This system is compared to two different hybrid MLP/HMMs, both adhering to the same restrictions regarding input context and output states as the Gaussian mixtures. All free parameters in the three systems are jointly optimised using the same global discriminative criterion. A Forward decoder, with total likelihood scoring, is used for recognition. While the global discriminative training method is found to improve the baseline HMM significantly, the differences between Gaussian and MLP-based architectures are small. The Gaussian mixture system however performs slightly better at the lowest complexity levels.
Bibliographic reference. Johansen, Finn Tore (1996): "A comparison of hybrid HMM architectures using global discriminative training", In ICSLP-1996, 498-501.