4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

A Probabilistic Framework for Feature-based Speech Recognition

James Glass, Jane Chang, Michael McCandless

Spoken Language Systems Group, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA

Most current speech recognizers use an observation space which is based on a temporal sequence of "frames" (e.g., Mel-cepstra). There is another class of recognizer which further processes these frames to produce a segment-based network, and represents each segment by fixed-dimensional "features." In such feature-based recognizers the observation space takes the form of a temporal network of feature vectors, so that a single segmentation of an utterance will use a subset of all possible feature vectors. In this work we examine a maximum a posteriori decoding strategy for feature-based recognizers and develop a normalization criterion useful for a segment-based Viterbi or A* search. We report experimental results for the task of phonetic recognition on the TIMIT corpus where we achieved context-independent and context-dependent (using diphones) results on the core test set of 64.1% and 69.5% respectively.

Full Paper

Bibliographic reference.  Glass, James / Chang, Jane / McCandless, Michael (1996): "A probabilistic framework for feature-based speech recognition", In ICSLP-1996, 2277-2280.