Adjusting the Frame: Biphasic Performative Control of Speech Rhythm

Samuel Delalez, Christophe d’Alessandro


Performative time and pitch scaling is a new research paradigm for prosodic analysis by synthesis. In this paper, a system for real-time recorded speech time and pitch scaling by the means of hands or feet gestures is designed and evaluated. Pitch is controlled with the preferred hand, using a stylus on a graphic tablet. Time is controlled using rhythmic frames, or constriction gestures, defined by pairs of control points. The “Arsis” corresponds to the constriction (weak beat of the syllable) and the “Thesis” corresponds to the vocalic nucleus (strong beat of the syllable). This biphasic control of rhythmic units is performed by the non-preferred hand using a button. Pitch and time scales are modified according to these gestural controls with the help of a real-time pitch synchronous overlap-add technique (RT-PSOLA). Rhythm and pitch control accuracy are assessed in a prosodic imitation experiment: the task is to reproduce intonation and rhythm of various sentences. The results show that inter-vocalic durations differ on average of only 20 ms. The system appears as a new and effective tool for performative speech and singing synthesis. Consequences and applications in speech prosody research are discussed.


 DOI: 10.21437/Interspeech.2017-396

Cite as: Delalez, S., d’Alessandro, C. (2017) Adjusting the Frame: Biphasic Performative Control of Speech Rhythm. Proc. Interspeech 2017, 864-868, DOI: 10.21437/Interspeech.2017-396.


@inproceedings{Delalez2017,
  author={Samuel Delalez and Christophe d’Alessandro},
  title={Adjusting the Frame: Biphasic Performative Control of Speech Rhythm},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={864--868},
  doi={10.21437/Interspeech.2017-396},
  url={http://dx.doi.org/10.21437/Interspeech.2017-396}
}