Second ESCA/IEEE Workshop on Speech Synthesis

September 12-15, 1994
Mohonk Mountain House, New Paltz, NY, USA

Combining Concatenation and Formant Synthesis for Improved Intelligibility and Naturalness in Text-to-Speech Systems

Steve Pearson, Heather Moran, Kazue Hata, Frode Holm

Speech Technology Laboratory, Panasonic Technologies, Inc., Santa Barbara, CA, USA

A general method which combines formant synthesis by rule and time-domain concatenation is investigated. The method aims to keep the advantages of both techniques, while at the same time minimizing difficulties such as prosodic modification and spectral discontinuities at the points of concatenation. We have integrated sampled natural glottal source [1] and sampled voiceless consonants into a real- time text-to-speech formant synthesizer. Also we have incorporated, in special cases, voicing amplitude envelopes and formant transitions derived from natural speech. Several listening tests were performed to evaluate these methods. The initial results are very promising. As found for Japanese [2], we obtained a significant overall improvement in intelligibility over our previous formant synthesizer. Also the results of subjective analysis show that these methods can improve naturalness and listenability factors.


  1. Matsui, K., S. Pearson, K. Hata, and T. Kamai, Proc. ICASSP 2.769-772, May, 1991, Toronto, Canada. Improving Naturalness in Text-to-Speech Synthesis using Natural Glottal Source.
  2. Kamai, T., K. Matsui, May l993. Acoustical Society of Japan Meeting. Investigation of Formant Synthesis Hybridized by introduction of Natural Waveform Segments.

Full Paper

Bibliographic reference.  Pearson, Steve / Moran, Heather / Hata, Kazue / Holm, Frode (1994): "Combining concatenation and formant synthesis for improved intelligibility and naturalness in text-to-speech systems", In SSW2-1994, 69-72.