First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

Segment Selection and Pitch Modification for High Quality Speech Synthesis using Waveform Segments

Tomohisa Hirokawa, Kazuo Hakoda

NTT Human Interface Laboratories, Kanagawa, Japan

We propose a new method for speech synthesis that concatenates waveforms selected from a waveform dictionary. The method uses a modified PSOLA technique to alter the pitch of waveforms selected from a dictionary. The limits of acceptable pitch shifts are determined by preference tests. To make segment selection more accurate, we introduce a new factor which considers the spectral continuity across voiced phoneme boundaries. The average spectral difference is reduced from 5.4dB to 2.7dB and the synthesized voice is more fluent.

Full Paper

Bibliographic reference.  Hirokawa, Tomohisa / Hakoda, Kazuo (1990): "Segment selection and pitch modification for high quality speech synthesis using waveform segments", In ICSLP-1990, 337-340.