Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

Robust Automatic Extraction of Diphones with Variable Boundaries

Debra Yarrington (1), H. Timothy Bunnell (2), Gene Ball (2)

(1) Applied Science and Engineering Laboratories, University of Delaware/ A.I. duPont Institute, Wilmington, Delaware, USA
(2) Microsoft Corporation, Redmond, Washington, USA

This paper presents the approach under development at the Applied Science and Engineering Laboratories (ASEL) for automatic extraction of diphones from a speech database. The present system operates on a set of digitized spoken carrier words to (a) assign segment boundaries within the carrier words, (b) select the best instances of each diphone among the carrier words containing that diphone, and (c) assign multiple, context conditioned, boundaries within the selected diphones. Experiments designed to test the intelligibility and naturalness of the automatically extracted diphones indicated that the automatically extracted diphones resulted in synthesized speech that was slightly more natural sounding and slightly less intelligible than speech synthesized from manually extracted diphones of the same talker.

Full Paper

Bibliographic reference.  Yarrington, Debra / Bunnell, H. Timothy / Ball, Gene (1995): "Robust automatic extraction of diphones with variable boundaries", In EUROSPEECH-1995, 1845-1848.