Third ESCA/COCOSDA Workshop on Speech Synthesis
November 26-29, 1998
PROCSY is a hybrid method of automatically producing natural-sounding formant-based synthetic speech from an existing speech signal by using copy-synthesis and estimated articulatory trajectories as input to the HLsynTM synthesizer (Sensimetrics Corporation). The purpose is to allow controlled manipulation of selected acoustic parameters. Parameters for HLsyn are derived from labelled speech files in two ways. Broadly, vowels and approximants are copy-synthesized from the acoustic signal, while obstruents and nasals are synthesized by rule: articulatory trajectories and constriction areas are estimated from the segment label and duration, and converted into HL parameter values. HLsyn combines information from both sources to calculate parameter values for a Klatt-type synthesizer. Strengths of the method are (i) simple HLsyn input captures acoustically complex obstruents, and (ii) HLsyn parameters automatically produce complex acoustic properties that accompany consonantal closures, especially at segment boundaries. These properties are hard to synthesize and thus typically absent in formant TTS, yet they provide some of the systematic variability we hypothesize contributes to robust, natural-sounding synthesis. Potential applications are discussed.
Bibliographic reference. Heid, Sebastian / Hawkins, Sarah (1998): "PROCSY: A Hybrid Approach to High-quality Formant Synthesis using HLSyn", In SSW3-1998, 219-224.