Second International Conference on Spoken Language Processing (ICSLP'92)

Banff, Alberta, Canada
October 13-16, 1992

Speech Segment Network Approach for an Optimal Synthesis Unit Set

Naoto Iwahashi, Yoshinori Sagisaka

ATR Interpreting Telephony Research Laboratories, Kyoto, Japan

In this paper, a Speech Segment Network ( SSN ) approach is proposed for construction of a small speech unit set with which high quality speech can be synthesized. s The SSN approach selects a speech unit set in which segmental and/or inter-segmental distortions are minimized by using combinatorial optimization methods such as iterative improvement or simulated annealing. Experimental results using diphone segments showed that the optimal diphone unit sets with total or maximum of inter-segmental distortion reduced by about 35%, 70% respectively can be constructed by this method. This reduction rate is enhanced as the segment population increased. Effectiveness of this unit set design was also perceptually confirmed by listening test using speech synthesized with the selected diphone unit set.

