Second ESCA/IEEE Workshop on Speech Synthesis

September 12-15, 1994
Mohonk Mountain House, New Paltz, NY, USA

A Model of Timiny for Non-segmental Phonological Structure

John Local, Richard Ogden

Experimental Phonetics Laboratory, University of York, UK

One of the enduring problems in achieving natural sounding synthetic speech is that of getting the rhythm right. Usually this problem is construed as the search for appropriate algorithms for altering durations of segments under various contextual conditions (eg initially versus final in word or phrase, in stressed versus unstressed syllables). Recently, Campbell and Isard (1991) have suggested that a more effective model is one in which the syllable is taken as the distinguished timing unit and segmental durations accommodated secondarily to syllable durations. We propose here that there is no distinguished timing unit While other synthesis systems use phonemes, diphones or other linearly arranged phone-sized units and employ 'hidden structure1, YorkTalk uses explicit tree-like phonological representations.
We will compare the temporal characteristics of the output of the YorkTalk system with Klattalk (Klatt, ms) on one hand and the naturalistic observations of Fowler (1981) on the other. We will show that it is possible to produce similar, natural sounding temporal relations by employing linguistic structures which are given a compositional parametric and temporal interpretation (Local, 1992; Ogden, 1992).
YorkTalk's metrical and phontactic parsers parse input into structures consisting of feet, syllables and syllable constituents. In these structures, the rime is the head of the syllable, the nucleus is the head of the rime and the strong syllable is the head of the foot (cf Coleman 1992). Every node in the graph is given a head-first temporal and parametric phonetic interpretation. A co-production model of coarticulation (cf Fowler 1980) is implemented in YorkTalk by overlaying parameters. Since the nucleus is the head of the syllable the nucleus and syllable are coextensive. By fitting the onset and coda within the temporal space of the nucleus they inherit the properties of the whole syllable. Where structures permit, constituents are shared between syllables as shown below (ambisyllabicity). The temporal interpretation of ambisyllabicity is the temporal and parametric overlaying of one syllable on another (Local, 1992; Ogden, 1992).


Bibliographic reference.  Local, John / Ogden, Richard (1994): "A model of timiny for non-segmental phonological structure", In SSW2-1994, 236-239.