Second International Conference on Spoken Language Processing (ICSLP'92)

Banff, Alberta, Canada
October 13-16, 1992

Unrestricted Text-To-Speech Revisited: Rhythm and Intonation

David R. Hill, Craig-Richard Schock, Leonard C. Manzara

Computer-Human Systems Lab, Dept. of Computer Science, U of Calgary, Calgary, Alberta, Canada

A new speech-synthesis-by-rules system has been developed, at the University of Calgary, in an object-oriented programming environment, on the NeXT computer. This paper outlines the models used to create rhythm and intonation for the synthesised speech. A companion paper (DEGAS: a system for rule-based diphone speech synthesis) outlines the framework for segment specification.

The rhythm model is based on data obtained from real speech and represents a continuation of earlier work. The difficulties in reconciling the structure used for synthesis with the structure assumed for traditional segmental analysis are outlined. The intonation model is based on the descriptive framework developed by M.A.K. Halliday, as used for teaching non-native speakers to produce reasonable intonation for spoken English. The encouraging results from preliminary subjective testing of the intonation model, with utterances synthesised using the system, are described. On the basis of these tests, together with informal evaluation, we conclude that the speech produced represents a significant improvement in naturalness and intelligibility. The research has provided the basis for a new commercial text-to-speech system now being marketed by Trillium Sound Research Inc.-a successful technology transfer exercise.

Full Paper

Bibliographic reference.  Hill, David R. / Schock, Craig-Richard / Manzara, Leonard C. (1992): "Unrestricted text-to-speech revisited: rhythm and intonation", In ICSLP-1992, 1219-1222.