5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

String-Level MCE for Continuous Phoneme Recognition

Erik McDermott, Shigeru Katagiri

ATR Human Information Processing Res. Labs, Seikacho, Soraku-gun, Kyoto, Japan

In this paper, we present results for the Minimum Classification Error (MCE) [1] framework for discriminative training applied to tasks in continuous phoneme recognition. The results obtained using MCE are compared with results for Maximum Likelihood Estimation (MLE). We examine the ability of MCE to attain high recognition performance with a small number of parameters. Phoneme-level and string-level MCE loss functions were used as the optimization criteria for a Prototype- Based Minimum Error Classifier (PBMEC) [2] and an HMM [3]. The former was optimized using Generalized Probabilistic Descent, the latter was optimized using an approximated second order method, the Quickprop algorithm. Two databases were used in this evaluation: 1) the ATR 5240 isolated word datasets for 6 speakers, in both speaker-dependent and multi-speaker mode; 2) the TIMIT database. For both databases, MCE training yielded striking gains in performance and classifier compactness compared to MLE baselines. For instance, through MCE training, performance similar to that of the Maximum Likelihood Successive State Splitting algorithm (ML-SSS) [4] could be obtained with 20 times fewer parameters.

Full Paper

Bibliographic reference.  McDermott, Erik / Katagiri, Shigeru (1997): "String-level MCE for continuous phoneme recognition", In EUROSPEECH-1997, 123-126.