First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

Phoneme Segment Concatenation and Excitation Control Based on Spectral Distortion Criterion for Speech Synthesis

Kenzo Itoh, Hideyuki Mizuno, Tetsuya Nomura, Hirokazu Sato

Speech and Acoustics Laboratory, NTT Human Interface Laboratories, Kanagawa, Japan

This paper proposes two new methods based on spectral distortion criteria that produce high quality speech synthesis. One is a phoneme segment selection method using an objective continuity measure, and the other is an excitation signal extraction method for pitch and duration control. The continuity measure is expressed using continuity of the LPC spectrum envelopes. When this measure is used for optimum selection, natural sounding synthetic speech is produced without any smoothing technique. For pitch and duration control, an automatic excitation signal extraction method is proposed that also uses the spectral distortion criteria between original and synthetic speech based on residual excited LPC vocoder. When this new pitch and duration control method is used, the average LPC cepstrum distortion (CD) is decreased from 1.90 dB to 1.01 dB.

