4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Analysis of Context-dependent Segmental Duration for Automatic Speech Recognition

Xue Wang (1), Louis C. W. Pols (1), Louis F. M. ten Bosch (1,2)

(1) Institute of Phonetic Sciences / IFOTT, University of Amsterdam, The Netherlands
(2) Lernout & Hauspie Speech Products N.V., Brussels, Belgium

This paper presents statistical analyses of context-dependent phone durations using the hand-segmented TIMIT database, for the purpose of improving automatic speech recognition. Two main approaches were used. (1) Duration distributions were found under the influence of individual contextual factors, such as broader classes specified by long or short vowels, word stress, syllable position within the word and within an utterance, postvocalic consonants, and utterance speaking rate. (2) A hierarchically structured analysis of variance was used to study the numerical contributions of 11 different contextual factors to the variation in duration.

Full Paper

Bibliographic reference.  Wang, Xue / Pols, Louis C. W. / Bosch, Louis F. M. ten (1996): "Analysis of context-dependent segmental duration for automatic speech recognition", In ICSLP-1996, 1181-1184.