Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

Spectral Control in Concatenative Speech Synthesis

Alexander B. Kain, Qi Miao, Jan P. H. van Santen

Center for Spoken Language Understanding (CSLU), OGI School of Science & Engineering, Oregon Health & Science University (OHSU), Beaverton, OR, USA

We report on research in which we increased the degree of spectral control in concatenative synthesis by controlling the formant frequencies of the synthetic speech, as well as the energies in four spectral bands. In addition, we eliminated "points" of concatenation in favor of "regions" of concatenation, by cross-fading between the end and the beginning of two speech segments that are part of a concatenation operation. We hypothesized that these approaches would decrease the frequency and severity of audible discontinuities in the synthetic speech and thus also increase the perceived quality of the speech. A listening test determined that stimuli created with the proposed methods resulted in significantly increased quality.

Full Paper   Presentation (pdf)
Sound examples:   01   02

Bibliographic reference.  Kain, Alexander B. / Miao, Qi / Santen, Jan P. H. van (2007): "Spectral control in concatenative speech synthesis", In SSW6-2007, 11-16.