First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

Text-to-Speech Synthesis Using a Natural Voice Source

Stephen D. Pearson (1), Hector R. Javkin (1), Kenji Matsui (2), Takahiro Kamai (3)

(1) Speech Technology Laboratory, Division of Panasonic Technologies, Inc., Santa Barbara, CA, USA (2) Central Research Laboratories, Matsushita Electric Industrial Co., Ltd., Osaka, Japan (3) Faculty of Engineering, Osaka University, Osaka, Japan

Our aim is to improve text-to-speech in its naturalness and its ability to model individual speakers. This paper describes various methods for using inverse-filtered waveforms from natural speech as a voice source in a text-to-speech system. One method uses a repeating loop, and controls pitch by interpolating samples in the waveform. Another method creates a source waveform of the desired pitch by concatenating single pulses from a collection of pulses. Listening tests were carried out to compare these methods with each other and with more traditional voice source generation techniques. The results indicate that these "natural glottal source" methods can substantially improve the quality of text-to-speech synthesis.

Full Paper

Bibliographic reference.  Pearson, Stephen D. / Javkin, Hector R. / Matsui, Kenji / Kamai, Takahiro (1990): "Text-to-speech synthesis using a natural voice source", In ICSLP-1990, 193-196.