4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
Decision trees have been used in speech recognition with large numbers of context-dependent HMM models, to provide models for contexts not seen in training. Trees are usually created by successive node splitting decisions, based on how well a single Gaussian or Poisson density fits the data associated with a node. We introduce a new node splitting criterion, derived from the maximum likelihood fitting of the complex node distributions with Gaussian tied-mixture densities. We also carry the use of decision trees for tying HMM models a step further. In addition to questions about phonetic class of neighbouring phonemes,we allow questions about the HMM model state to be asked. The resulting decision tree maximizes the likelihood by adjusting the amount of parameter tying simultaneously across state and context. Accuracy improvement and model size reduction were evaluated on a gender-dependent 5K closed-vocabulary WSJ task, using the SI-84 and SI-284 training sets, for tied-mixture and continuous HMM models. The new decision trees are shown to reduce both error rate and model size, while being computationally cheap enough to allow consideration of two preceding and two following phones for the context.
Bibliographic reference. Boulianne, Gilles / Kenny, Patrick (1996): "Optimal tying of HMM mixture densities using decision trees", In ICSLP-1996, 350-353.