First International Conference on Spoken Language Processing (ICSLP 90)
This paper describes the prosodic processing and wave-form generation components of the text-to-speech system being developed at Edinburgh University's Centre for Speech Technology Research. Intonation is specified as a sequence of minimal descriptors whose locations are given in terms of syntactically-determined prosodic domains. A pitch contour is computed by converting the descriptors into a sequence of abstract targets whose absolute values depend on a specific speaker model. Duration is determined first at the level of the syllable by a neural network, then accommodated at the segment level according to the distributions observed in a phonetically balanced database. The output waveform is generated by LPC resynthesis of diphone units. Three methods of diphone segmentation are discussed.
Bibliographic reference. Campbell, W. Nick / Isard, Stephen D. / Monaghan, Alex I. C. / Verhoeven, J. (1990): "Duration, pitch and diphones in the CSTR TTS system", In ICSLP-1990, 825-828.