A computationally very expensive task arising within speech recognition systems using continuous mixture density HMMs is the log-likelihood computation. In the Philips large-vocabulary continuous-speech recognition system it consumes 50% - 75% of the decoding time. In our system the log-likelihood computation amounts to a nearest-neighbor search, i.e. to a search for the component density of a mixture density whose mean vector has a minimal distance to the observed feature vector. In this paper, we show that a Hamming Distance Approximation (HDA) of the angles between the vectors leads to a powerful nearest-neighbor search technique with negligible memory demands. Thus the likelihood-computation was sped up by a factor of 10 without significant increase in the word error rate of our large vocabulary speech recognizer. Since the likelihood-computation in this system consumed 66% of the recognition runtime, the overall decoding runtime could be reduced by a factor of 2.5. We also report results on Tl-digits and the WSJ task.
Bibliographic reference. Beyerlein, Peter / Ullrich, Meinhard (1995): "Hamming distance approximation for a fast log-likelihood computation for mixture densities", In EUROSPEECH-1995, 1083-1086.