Social and Functional Pressures in Vocal Alignment: Differences for Human and Voice-AI Interlocutors

Georgia Zellou, Michelle Cohn


Increasingly, people are having conversational interactions with voice-AI systems, such as Amazon’s Alexa. Do the same social and functional pressures that mediate alignment toward human interlocutors also predict align patterns toward voice-AI? We designed an interactive dialogue task to investigate this question. Each trial consisted of scripted, interactive turns between a participant and a model talker (pre-recorded from either a natural production or voice-AI): First, participants produced target words in a carrier phrase. Then, a model talker responded with an utterance containing the target word. The interlocutor responses varied by 1) communicative affect (social) and 2) correctness (functional). Finally, participants repeated the carrier phrase. Degree of phonetic alignment was assessed acoustically between the target word in the model’s response and participants’ response. Results indicate that social and functional factors distinctly mediate alignment toward AI and humans. Findings are discussed with reference to theories of alignment and human-computer interaction.


 DOI: 10.21437/Interspeech.2020-1335

Cite as: Zellou, G., Cohn, M. (2020) Social and Functional Pressures in Vocal Alignment: Differences for Human and Voice-AI Interlocutors. Proc. Interspeech 2020, 1634-1638, DOI: 10.21437/Interspeech.2020-1335.


@inproceedings{Zellou2020,
  author={Georgia Zellou and Michelle Cohn},
  title={{Social and Functional Pressures in Vocal Alignment: Differences for Human and Voice-AI Interlocutors}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1634--1638},
  doi={10.21437/Interspeech.2020-1335},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1335}
}