4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Non-segmental Analysis and Synthesis Based on a Speech Database

Andrew Slater, John Coleman

Phonetics Laboratory, University of Oxford, Oxford, UK

This paper reports on experiments in non-segmental speech analysis and synthesis using parameters derived from a speech database of British English monosyllables. The database includes almost every onset, nucleus and coda, and almost all onset-nucleus and nucleus-consonant combinations occurring in English. Acoustic parameters including f0, formant frequencies and bandwidths, and amplitude of voicing were determined for each token in the database. Fine duration differences within minimal pairs are analyzed using dynamic time warping techniques, avoiding the need for manual segmentation. For each parameter, a matrix of distances between all samples of the two words is calculated, together with a minimal path through the matrix (the warp path). The set of warp paths for all parameters identifies the nature and location of acoustic differences between the words, including locations of temporal expansion and compression. Preliminary experiments using dynamic time warping for non-segmental synthesis are also discussed.

