We propose to use a stochastic segmental duration model independent of the HMM model in INRS's large vocabulary speech continuous speech recognizer. First, we examine how to insert this model into the search algorithm without violating the optimality constraints of this algorithm. Second, we propose and test the performance of four different duration models. The training and testing of the models is done on a studio quality speaker-dependent speech corpus. The first model is a rule-based model which imposes minimum and maximum phone durations. The second model is a Gaussian mixture phone duration model independent of the phonemic context. The third model is a Gaussian mixture phone duration model dependent on the right or left phoneme context. Finally, the last model is a Gaussian mixture duration model based on the variation of duration within a diphone. Performance comparisons show that the best model is the first one which imposes hard constraints on duration. This model improves the percentage of word recognition from 89.58% (no duration modeling) to 90.11%. Keywords: segmental duration, prosody, suprasegmental features, Markov source-based continuous speech recognizer, large vocabulary
Bibliographic reference. Dumouchel, Pierre / O'Shaughnessy, Douglas (1995): "Segmental duration and HMM modeling", In EUROSPEECH-1995, 803-806.