Third International Conference on Spoken Language Processing (ICSLP 94)
As a preliminary to improving the naturalness of the synthetic male and female voices in a Danish text-to-speech system using a rule-driven formant synthesizer, the relative importance of the individual formant frequencies and bandwidths has been investigated. Recordings of a Danish compound word consisting entirely of voiced segments were analyzed. Based on these recordings and the analysis, a number of manipulated, synthetic stimuli were created and presented in two listening tests. The main results of these simplifications are: a) Bandwidths (B5-B8) are more sensitive to simplifications than formants (F5-F8). b) F5-F8 may be held constant throughout the utterance, and B1-B4 may be kept constant per segment without perceptible loss of naturalness, c) B5-B8 may also be held constant, though with a minor loss of naturalness. A similar approach has been tried with female synthetic voices, and preliminary results corroborate the results outlined above. Among the more comprehensive simplifications in the male voice a hierarchy of acceptability was established.
Bibliographic reference. Dyhr, Niels-Jorn / Elmlund, Marianne / Henriksen, Carsten (1994): "Preserving naturalness in synthetic voices while minimizing variation in formant frequencies and bandwidths", In ICSLP-1994, 751-754.