4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Enhanced Shape-invariant Pitch and Time-scale Modification for Concatenative Speech Synthesis

M. P. Pollard (1), B. M. G. Cheetham (1), C. C. Goodyear (1), Mike D. Edgington (2), A. Lowry (2)

(1) Department of Electrical Engineering and Electronics, The University of Liverpool, Liverpool, UK
(2) B.T. Laboratories, Martlesham Heath, Ipswich, UK

To preserve shape-invariance when pitch or time-scale modifying sinusoidally modelled voiced speech, the phases of the sinusoids used to model the glottal excitation are made to add coherently at estimated excitation points. Previous methods achieve this by estimating excitation phases at synthesis frame boundaries, disregarding the frequency modulation that may occur between the frame boundary and the nearest modified excitation point. This approximation can produce a significant mis-alignment of the excitation phases, leading to distortion of the temporal structure of the synthetic speech. In this paper, a shape-invariant technique is proposed which aligns the excitation phases at excitation points, whilst allowing for variations in the frequency of the sinusoidal components.

Full Paper

Bibliographic reference.  Pollard, M. P. / Cheetham, B. M. G. / Goodyear, C. C. / Edgington, Mike D. / Lowry, A. (1996): "Enhanced shape-invariant pitch and time-scale modification for concatenative speech synthesis", In ICSLP-1996, 1433-1436.