Speech Prosody 2002
This paper reports on an experiment, whose goal it was to explore the relevance of both acoustic and visual cues for signaling negative or affirmative feedback in a conversation. Using the WaveSurfer software developed at CTT, the stimuli were created by orthogonally varying 6 parameters (4 visual and 2 acoustic ones), which always had two settings: one which was hypothesised to lead to affirmative feedback responses, and one which was hypothesised to lead to negative responses. Listeners were told that they were going to see and hear a series of exchanges between a talking head, representing a travel agent, and a human who wants to make a booking with the agent. They had to imagine that they were standing beside the human, and they were witnessing a fragment of a longer dialogue exchange. Their task was to rate this fragment in terms of whether the agent signals that he understands and accepts the human utterance, or whether the agent signals that he is uncertain about the human utterance. Results show that listeners are sensitive to both the visual and acoustic features when judging the utterances in terms of their function as feedback signals. Four of the six parameters had significant influence on the judgements, with Smile and F0 as the most prominent, followed by Eyebrow and Head_movement. Eye_closure and Delay contributed only marginally to the judgements but the tendency was in the expected direction.
Bibliographic reference. Granström, Björn / House, David / Swerts, Marc (2002): "Multimodal feedback cues in human-machine interactions", In SP-2002, 347-350.