This paper presents the approach under development at the Applied Science and Engineering Laboratories (ASEL) for automatic extraction of diphones from a speech database. The present system operates on a set of digitized spoken carrier words to (a) assign segment boundaries within the carrier words, (b) select the best instances of each diphone among the carrier words containing that diphone, and (c) assign multiple, context conditioned, boundaries within the selected diphones. Experiments designed to test the intelligibility and naturalness of the automatically extracted diphones indicated that the automatically extracted diphones resulted in synthesized speech that was slightly more natural sounding and slightly less intelligible than speech synthesized from manually extracted diphones of the same talker.
Bibliographic reference. Yarrington, Debra / Bunnell, H. Timothy / Ball, Gene (1995): "Robust automatic extraction of diphones with variable boundaries", In EUROSPEECH-1995, 1845-1848.