Temporal decomposition is a technique for modeling speech spectral evolution. Under this approach, a speech segment is described as a linear combination of a small number of spectral targets. The contributions of the targets are expressed by non-uniformly spaced interpolation functions that are constrained to be of limited duration. This model provides a mathematical representation of speech acoustic structure, that allows reconstruction by simple linear combinations. It thus could be applied for speech synthesis as well as for speech recognition. The aim of the work presented here has been to evaluate the temporal decomposition technique in providing consistent target vectors (that contain the spectral information) and interpolation functions (that give a time-segmentation). A protocol of experiments was therefore designed and carried out. Performances in segmentation were evaluated automatically using a manual phonemic labeling as a reference. Approximately 85 % phonemes in the normative transcription are directly retrieved by temporal decomposition, jointly with 35 % insertions. Dictionaries of spectral targets were constituted and compared with other dictionaries, built from original spectral parameters, in basic pattern recognition experiments. In most cases, target vectors and original parameters performed equally well. That tends to show that no significant spectral information is lost through temporal decomposition.
Bibliographic reference. Bimbot, Frederic / Atal, Bishnu S. (1991): "An evaluation of temporal decomposition", In EUROSPEECH-1991, 1089-1092.