INTERSPEECH 2006 - ICSLP
This paper describes an implementation of a discriminative acoustical model - a Conditional Random Field (CRF) - within a Dynamic Bayes Net (DBN) formulation of a Hierarchic Hidden Markov Model (HHMM) phone recognizer. This CRF-DBN topology accounts for phone transition dynamics in conditional probability distributions over random variables associated with observed evidence, and therefore has less need for hidden variable states corresponding to transitions between phones, leaving more hypothesis space available for modeling higherlevel linguistic phenomena such syntax and semantics. The model also has the interesting property that it explicitly represents likely formant trajectories and formant targets of modeled phones in its random variable distributions, making it more linguistically transparent than models based on traditional HMMs with conditionally independent evidence variables. Results on the standard TIMIT phone recognition task show this CRF evidence model, even with a relatively simple first-order feature set, is competitive with standard HMMs and DBN variants using static Gaussian mixture models on MFCC features.
Bibliographic reference. Schuler, William / Miller, Tim / Wu, Stephen / Exley, Andrew (2006): "Dynamic evidence models in a DBN phone recognizer", In INTERSPEECH-2006, paper 1770-Tue3A1O.6.