4^{th} International Conference on Spoken Language ProcessingPhiladelphia, PA, USA |
It is proposed to model the acoustic-phonetic structure of the Arabic language using a single ergodic hidden Markov model (HMM), since a single HMM (about 40-50 states) can be used to represent all acoustic phonetic effects. In this paper, we represent the techniques and algorithms used to perform that model, the problems associated with representing the whole acoustic-phonetic structure, the characteristics of the model, and how it performs as a phonetic decoder for recognition of fluent Arabic speech. The model is trained, segmented (manually and automatically), and labeled using a fixed number of phonemes, each of which has a direct correspondence to the states of the model. The model assumes that the observed spectral vectors were generated by a Gaussian source. The inherent variability of each phoneme is modeled as the observable random process of the Markov chain, while the phonotactic model of the unobservable phonetic sequence is represented by the state transition matrix of the HMM. The model incorporated the variable duration feature densities in each state. It is shown that the difficulties in developing an acoustic-phonetic model are due to the choice of the phonemes to be modeled, the selected parametrization of the data, and appropriate choice of the variant of the ergodic HMM.
Bibliographic reference. Mokhtar, M. A. / Zein-el-Abddin, A. (1996): "A model for the acoustic phonetic structure of arabic language using a single ergodic hidden Markov model", In ICSLP-1996, 330-333.