Auditory-Visual Speech Processing (AVSP) 2010
Hakone, Kanagawa, Japan
How is the perception of emotion affected by the provision of multiple sources of information (both within and across modality)? We examined how the perception of emotion differed depending upon which face regions were visible and which modality (auditory or visual, AV) was used. Auditory and visual speech of five talkers expressing anger, disgust, fear, happy, sad, surprise or neutral emotion were presented in face-only, voice-only and face-voice presentation conditions. The visual speech stimuli presented the upper, lower and whole face. The participants task was to judge which emotion was expressed. The results showed that the upper and lower parts of the talkers face were not equally informative across emotion types. Also the face and voice conveyed different degrees and types of emotion information. Response confusion matrices showed that, depending on the type of emotion, the whole face pattern resembled either the upper or lower face one. For the AV face-voice stimuli, the response pattern changed depending on the relative informativeness of the unimodal signals. Based on the results, we suggest a model for how emotion information from different sources is combined to drive perception.
Index Terms: Emotion perception, Auditory-Visual perception, Emotion recognition, Visual speech, Face and voice.
Bibliographic reference. Kim, Jeesun / Davis, Chris (2010): "Emotion perception by eye and ear and halves and wholes", In AVSP-2010, paper S3-2.