INTERSPEECH 2006 - ICSLP
In natural speech, there is a moderate correlation between the fundamental frequency and formant frequencies across talkers. The present study used a high-quality vocoder to manipulate these properties and determine their contribution to perceived naturalness and voice gender. The stimuli were re-synthesized sentences spoken by two adult males and two adult females. Scale factors were chosen for each sentence and for each talker to produce frequency-shifted versions with a specified mean fundamental frequency (F0) ranging from 60 Hz to 450 Hz in 10 steps, paired with 10 steps in geometric mean formant frequencies ranging from 850 Hz to 2500 Hz. Listeners judged frequencyshifted sentences as more natural when F0 and formant frequencies followed the co-variation of F0 and formant frequencies in natural voices. Sentences with low F0s and low formant frequencies were perceived as masculine, while sentences with high F0 and high formant frequencies were assigned high ratings of femininity. Sentences with "mismatched" F0 and formant frequencies were assigned ratings near the midpoint of the range, indicating gender ambiguity. Frequency-shifted sentences derived from male talkers received consistently higher ratings of masculinity than those derived from females and vice versa, even when assigned scale factors appropriate for the opposite gender, indicating that factors other than F0 and mean formant frequencies contribute to perceived gender.
Bibliographic reference. Assmann, Peter F. / Dembling, Sophia / Nearey, Terrance M. (2006): "Effects of frequency shifts on perceived naturalness and gender information in speech", In INTERSPEECH-2006, paper 1710-Tue1BuP.10.