EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Must Diphone Synthesis be so Unnatural?

William Barry (1), Claus Nielsen (2), Ove Andersen (2)

(1) University of the Saarland, Germany (2) Aalborg University, Denmark

An English utterance was synthesized in four versions using sets of diphones produced under four different prosodic and contextual conditions. The synthesis used either accented di-phones only or appropriately located accented and unaccented diphones, with each of these conditions being repeated using neutral-context and differentiated-context diphones. They were presented to two listener groups, a native English and a non-native group for paired comparison acceptability judgements. The results show a massive preference for the stress- and context-differentiated condition. Both stress and context had a significant effect on acceptability judgements, but context-differentiation raised acceptability more strongly than stress-differentiation. Both the native and the main sub-group of non-native listeners judged the stimuli in essentially the same way.

Full Paper

Bibliographic reference.  Barry, William / Nielsen, Claus / Andersen, Ove (2001): "Must diphone synthesis be so unnatural?", In EUROSPEECH-2001, 975-978.