We present a comparison of a continuous hidden Markov model (CHMM) with a hybrid system using a Time Delay Neural Network (TDNN) and a HMM for speaker-dependent continuous speech recognition. The network pre-processes the speech signal for a discrete HMM system. Several hybrid systems combining Neural Networks and HMMs have been compared with basic discrete HMMs using one codebook and no contextual phone models, in which case they give better results. The TDNN has a powerful ability to extract discriminant features but this possibility decreases when the difficulty of the task increases. Instead of learning a unique large neural net to capture all the regions in the feature space, we structured the net in sub-networks discriminating macro-classes of phonemes, each macro-class being itself a subnetwork discriminating phonemes in its own class. Each of these sub-networks has high performances. Two different integrations of these sub-networks in the HMM formalism are tested: combining all the sub-networks in a global net including a new learning phase, and integrating the hierarchically structured sub-networks directly in the HMM structure.
Bibliographic reference. Devillers, Laurence / Dugast, Christian (1991): "Comparison of continuous mixture densities and TDNN in a viterbi-framework: experiments on speaker dependent DARPA RM1+", In EUROSPEECH-1991, 991-994.