Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
This paper describes a method for the generation of sets of natural sounding speech stimuli, slowly changing from one speech signal to another. Stimulus continua created with this method were used in a large number of psycho-physical identification and discrimination experiments . Two recorded speech stimuli between which a continuum is made are first analyzed according to the Sine Wave Generation method [2,3,4,5,6,7,8]. This results in a set of parameters per frame, containing the frequency, and the amplitude and phase of vocal tract and vocal source, at the 50 major peaks of the Short-Time FFT spectrum. Because the vocal tract amplitudes in each frame comprise the information about the spectral envelope, modifying these amplitudes results in a modified spectral envelope, and thus in a different "timbre".
Linear interpolation between the spectral envelope amplitudes (SE-amplitudes) of the two recorded speech sounds results in a set of spectral envelopes that slowly change from one sound to the other. Replacing the original SE-amplitudes of one of the two original stimuli (the mother stimulus) with those of the interpolated set, results (after resynthesizing) .in set of stimuli that differ only in timbre; they slowly change from one sound to another.
High quality speech is obtained because the stimuli are resynthesized with all their original parameters; only the SE-amplitudes are modified. The thus created speech sounds contain all the speaker specific characteristics of the "mother stimulus" and sound very natural because no important information is lost.
Bibliographic reference. Hessen, Arjan van (1992): "Generation of natural sounding speech stimuli by means of linear cepstral interpolation", In ICSLP-1992, 1163-1166.