13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

MAP Estimation of Whole-Word Acoustic Models with Dictionary Priors

Keith Kintzley (1), Aren Jansen (1,2), Hynek Hermansky (1,2)

(1) Dept. of Electrical and Computer Engineering (2) HLT Center of Excellence
Johns Hopkins University, Baltimore, MD, USA

The intrinsic advantages of whole-word acoustic modeling are offset by the problem of data sparsity. To address this, we present several parametric approaches to estimating intra-word phonetic timing models under the assumption that relative timing is independent of word duration. We show evidence that the timing of phonetic events is well described by the Gaussian distribution. We explore the construction of models in the absence of keyword examples (dictionary-based), when keyword examples are abundant (Gaussian mixture models), and also present a Bayesian approach which unifies the two. Applying these techniques in a point process model keyword spotting framework, we demonstrate a 55% relative improvement in performance for models constructed from few examples.

Index Terms: phonetic timing, whole-word modeling, keyword spotting, point process model

Full Paper

Bibliographic reference.  Kintzley, Keith / Jansen, Aren / Hermansky, Hynek (2012): "MAP estimation of whole-word acoustic models with dictionary priors", In INTERSPEECH-2012, 787-790.