A hidden Markov model (HMM) with variable-length-segment dependent observations is presented. This model is a form of n -gram constrained HMM, with the segment-length n determined by maximizing the likelihood of the observations in both training and recognition. The objective is to better capture the temporal correlation structure, which is assumed to be time-varying due to the nonstationary and utterance-specific characteristics of speech, as opposed to some constant temporal structures in many available HMM approaches. In the construction of the model, a variable-to-fixed length conversion is introduced; an information-theoretic criterion based segment-length weighting scheme is proposed to control the search for the optimal segment-lengths as well as to compensate for the effect of the segment-length-conversion. The model is estimated by using a modified Viterbi algorithm which performs joint state-sequence decoding and segmentation. Both speaker-dependent and speaker-independent recognition experiments show clearly the advantage of this optimized segment-dependent structure over the fixed-length segment based model.
Bibliographic reference. Mingy, Ji / O'Boyle, Peter / Smith, Jack (1995): "An HMM with optimized segment-dependent observations for speech recognition", In EUROSPEECH-1995, 1475-1478.