EUROSPEECH 2001 Scandinavia
Concatenative speech synthesis quality depends in part on the minimization of audible discontinuities between two successive concatenated units. This study focuses on human detection of concatenation discontinuities in synthetic speech. A phonetic analysis compared the perceptual results from two voices -- one female and one male. Neither a comprehensive phonetic analysis nor a comparison of discontinuity detection between voices has been reported previously. Although discontinuities were generally more detectable for the female than the male, there were many similarities between results obtained from the two speakers. A reliably higher detection rate was observed for diphthongs than for monophthong vowels. Post-vocalic consonants influenced concatenation discontinuities significantly more than prevocalic consonants, and post-vocalic sonorants were associated with higher detection rates than post-vocalic non-sonorants. The differences in discontinuity detection among vowels and consonantal contexts for both voices consistently suggest that highly audible discontinuity is related to concatenation in regions of spectral change.
Bibliographic reference. Syrdal, Ann K. (2001): "Phonetic effects on listener detection of vowel concatenation", In EUROSPEECH-2001, 979-982.