We describe two techniques for phoneme identification, which give the phoneme plausibility at a frame slot, for all phonemes and all frames. In the first one, non-linear vectorial interpolation, the correlation between vectors of the input speech as well as between components within vectors are supposed to be specific to phonetic units and are extracted using non-linear vector interpolators trained for each phoneme to minimize interpolation error. Three categories of interpolators are introduced, according to quantities to be interpolated: vector-pair, vector-center and component-pair. The second one, reference comparison, estimates the plausibility by template matching which consists in computing slot by slot the maximum of the average similarity between references and input speech vectors. We have evaluated the performance of the four models (reference comparison model and non-linear interpolator models) using a phoneme-based continuous speech recognition system. Throughout the experiments the four systems were connected successively to the same large vocabulary continuous speech recognition system as phoneme recognition input. In speaker-dependent tests using 30 sentences as training data and 12 LPCC-derived cepstral coefficients as parametric vectors, the reference comparison method yielded best, global recognition rate, while vector-pair model showed highly promising. Keywords: phoneme recognition, non-linear interpolation, artificial neural networks
Bibliographic reference. Gong, Yifan / Haton, Jean-Paul (1991): "Comparing two phoneme identification methods using a continuous speech recognizer", In EUROSPEECH-1991, 417-420.