EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology
2nd INTERSPEECH Event

Aalborg, Denmark
September 3-7, 2001

                 

Automatic Segmentation of Recorded Speech into Syllables for Speech Synthesis

Eric Lewis (1), Mark Tatham (2)

(1) University of Bristol, UK; (2) University of Essex, UK

Concatenated waveform text-to-speech synthesis systems require an inventory of stored waveforms from which units of speech can be extracted for subsequent rearrangement and concatenation as needed. In previous papers [1], [2] we have argued that for natural sounding speech the syllable should be the preferred unit. The mark-up of the stored waveforms for segmentation into syllables must be precise and for our MeteoSPRUCE limited domain system the mark-up has been done by manual editing. In this paper we describe how most of the segmentation can be done automatically, leaving only those waveforms which would be prone to error to be segmented manually. With automatic labelling of both the pitch periods and the syllables the task of generating different synthetic voices to order becomes feasible.

Full Paper

Bibliographic reference.  Lewis, Eric / Tatham, Mark (2001): "Automatic segmentation of recorded speech into syllables for speech synthesis", In EUROSPEECH-2001, 1703-1706.