4^{th} International Conference on Spoken Language ProcessingPhiladelphia, PA, USA |
This papers first introduces the theory of Stochastic Trajectory Models (STMs). STM represents the acoustic observations of a speech unit as clusters of trajectories in a parameter space. The trajectories are modeled by mixture of probability density functions of random sequence of states. Each state is associated with a multi-variate Gaussian density function, optimized at state sequence level. The effect of not using the HMM assumptions in STM is that STM can exploit information, such as time correlation within an observation sequence, which is hidden by HMM assumptions. After analyzing the characteristics of Chinese speech, the acoustic units for recognizing continuous Chinese speech taking advantage of Stochastic Trajectory Models are discussed and phone-like units, which are similar to or smaller than Initial-Final-like units, are suggested. The total number of the phone-like units (about 50) is the smallest in almost Chinese speech recognition system. Consequently, the training database can be very small. The performance of continuous Chinese speech recognition based on STM is studied using the VINICS system. The experimental results demonstrate the efficiency of STM and the consistency of phone-like units.
Bibliographic reference. Ma, Xiaohui / Gong, Yifan / Fu, Yuqing / Lu, Jiren / Haton, Jean-Paul (1996): "A study on continuous Chinese speech recognition based on stochastic trajectory models", In ICSLP-1996, 482-485.