4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
This paper presents a set of novel algorithms for the signal modification component of concatenative text-to-speech systems. The algorithms described here are based around the LPC analysis/synthesis framework, and achieve prosodic modification by time-domain processing of the LPC residual. The modified residual is then recombined with the all-pole spectral estimate to synthesise the new speech signal. The methods differ in the processing applied to the residual signal. The first method uses a modified version of TD-PSOLA, relying on assumptions of decorrelation and spectral flatness to avoid spectral distortion. The second method uses multiple windowing within each pitch period, enabling a given pitch modification to be realised by shifting several windowed segments by small amounts rather than a large shift of a single window. Again the aim is to reduce phase distortion introduced by the time-shifting process. The third method is based on a smoothly varying resampling of the residual, rather than windowed overlap-add. TD-PSOLA and the residual-based methods were subject to informal listening tests both with pitch and time-scaled natural speech, and also integrated into the signal processing stage of the BT Laureate text-to-speech system.
Bibliographic reference. Edgington, Mike D. / Lowry, A. (1996): "Residual-based speech modification algorithms for text-to-speech synthesis", In ICSLP-1996, 1425-1428.