5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Prosodic vs. Segmental Contributions to Naturalness in a Diphone Synthesizer

H. Timothy Bunnell (1), Steve R. Hoskins (2), Debra Yarrington (2)

(1) The duPont Hosp. for Children & Univ. of DE, USA
(2) University of Delaware & duPont Hosp. for Children, USA

The relative contributions of segmental versus prosodic factors to the perceived naturalness of synthetic speech was measured by transplanting prosody between natural speech and the output of a diphone synthesizer. A small corpus was created containing matched sentence pairs wherein one member of the pair was a natural utterance and the other was a synthetic utterance generated with diphone data from the same talker. Two additional sentences were formed from each sentence pair by transplanting the prosodic structure between the natural and synthetic members of each pair. In two listening experiments subjects were asked to (a) classify each sentence as "natural" or "synthetic, or (b) rate the naturalness of each sentence. Results showed that the prosodic information was more important than segmental information in both classification and ratings of naturalness.

Full Paper

Bibliographic reference.  Bunnell, H. Timothy / Hoskins, Steve R. / Yarrington, Debra (1998): "Prosodic vs. segmental contributions to naturalness in a diphone synthesizer", In ICSLP-1998, paper 0857.