Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Dynamic Evidence Models in a DBN Phone Recognizer

William Schuler, Tim Miller, Stephen Wu, Andrew Exley

University of Minnesota, USA

This paper describes an implementation of a discriminative acoustical model - a Conditional Random Field (CRF) - within a Dynamic Bayes Net (DBN) formulation of a Hierarchic Hidden Markov Model (HHMM) phone recognizer. This CRF-DBN topology accounts for phone transition dynamics in conditional probability distributions over random variables associated with observed evidence, and therefore has less need for hidden variable states corresponding to transitions between phones, leaving more hypothesis space available for modeling higherlevel linguistic phenomena such syntax and semantics. The model also has the interesting property that it explicitly represents likely formant trajectories and formant targets of modeled phones in its random variable distributions, making it more linguistically transparent than models based on traditional HMMs with conditionally independent evidence variables. Results on the standard TIMIT phone recognition task show this CRF evidence model, even with a relatively simple first-order feature set, is competitive with standard HMMs and DBN variants using static Gaussian mixture models on MFCC features.

Full Paper

Bibliographic reference.  Schuler, William / Miller, Tim / Wu, Stephen / Exley, Andrew (2006): "Dynamic evidence models in a DBN phone recognizer", In INTERSPEECH-2006, paper 1770-Tue3A1O.6.