Auditory-Visual Speech Processing (AVSP) 2009
University of East Anglia, Norwich, UK
Recent findings demonstrate that audiovisual fusion during speech perception may involve pre-phonetic processing. The aim of the current experiment is to investigate this hypothesis using a pairing task between auditory sequences of vowels and non speech visual cues. The audio sequences are composed of 6 auditory French vowels alternating in pitch (or not) in order to build 2 interleaved streams of 3 vowels each. Various elementary visual displays are mounted in synchrony with one vowel stream out of the two. Our hypothesis is that, in a forced choice pairing task, the AV synchronized vowels will be found more frequently if such a perceptual binding operates. We show that the most efficient visual feature increasing pairing performance is the movement. Surprisingly, some features we manipulated do not provide the increase in pairing performances. The visual cue of contrast variation is not correctly paired with the synchronized auditory vowels. Moreover, the auditory segregation, based on the pitch difference between the vowels streams, has no additional effect on pairing. In addition, the modulation of the auditory envelop, synchronized with the variation of the visual cue, has also no effect. Finally, when we introduce a phonetic cue in the visual display, pairing increases in comparison with non specific visual cues. The relative contribution of perceptual binding and late phonetic fusion is discussed.
Index Terms: Audiovisual fusion, perceptual binding, multimodal phonetic processing
Bibliographic reference. Devergie, Aymeric / Berthommier, Frédéric / Grimault, Nicolas (2009): "Pairing audio speech and various visual displays: binding or not binding?", In AVSP-2009, 140-144.