INTERSPEECH 2006  ICSLP

Realtime performance is a very important goal for embedded speech recognition systems, where the evaluation of likelihoods for Gaussian mixture models (GMM) usually dominates the computation of a continuous density hidden Markov model (CDHMM) based system. The Bucket Box Intersection (BBI) algorithm is an optimization technique that uses a KDimensional binary tree to speed up the score computation of GMM without significantly hurting the recognition accuracy. In this paper, we propose three improvements to the traditional BBI algorithm. First, we define the optimal dividing hyperplane as the plane that generates minimum expected number of mixture evaluations instead of the median hyperplane. The size of BBI tree is reduced largely because of that. Second, we refine the location of dividing plane as the one that has the same Mahalanobis distance to the closest dividing mixture pair, instead of the boundary of Gaussian box. By doing this, we are able to improve recognition accuracy. Finally, we introduce the dividing planes which run across two dimensions to boost the range of dividing plane candidates and thus bring more speedups. We evaluated these techniques using Conversay’s speech engine CASSI in 2 different domains. The experimental results of new BBI algorithm show significant performance improvement over traditional BBI algorithm. Compared to the baseline system with no BBI algorithm implementation, we were able to speed up Gaussian computations by 50% with a less than 5% relative increase in word error rate.
Bibliographic reference. Tang, Min / Ganapathiraju, Aravind (2006): "Improvements to bucket box intersection algorithm for fast GMM computation in embedded speech recognition systems", In INTERSPEECH2006, paper 1678Mon3BuP.8.