5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Psychophysical Evaluation of PSOLA: Natural Versus Synthetic Speech

Reinier Kortekaas, Armin Kohlrausch

IPO Center for Research on User-System Interaction, Eindhoven, The Netherlands

This paper presents the results of psychophysical experiments dealing with pitch-marker positioning within the Pitch Synchronous OverLap and Add (PSOLA) framework. Sustained natural vowels were PSOLA-modified in fundamental frequency. The experiments were aimed at determining the auditory sensitivity to (1) deterministic shifts of either all or single pitch markers within a sequence, and (2) random shifts of all pitch markers ("jitter"). As for deterministic shifts of all pitch markers, the results were in reasonable agreement with results obtained previously for synthetic formant signals. For deterministic shifts of single pitch markers, thresholds depended on position in the sequence. Detection thresholds for jittered shifts were comparable to thresholds for detecting jitter in pulse trains. The ranking of the thresholds for these three conditions indicated that the auditory system is more sensitive to dynamic (modulation) cues rather than to static (timbral) cues arising from shifts in pitch-marker positioning.

Full Paper

Bibliographic reference.  Kortekaas, Reinier / Kohlrausch, Armin (1997): "Psychophysical evaluation of PSOLA: natural versus synthetic speech", In EUROSPEECH-1997, 2487-2490.