5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Cultural Similarities and Differences in the Recognition of Audio-Visual Speech Stimuli

Sumi Shigeno

Kitasato University, Japan

Cultural similarities and differences were compared between Japanese and North American subjects in the recognition of emotion. Seven native Japanese and five native North Americans (four Americans and one Canadian) subjects participated in the experiments. The materials were five meaningful words or short-sentences in Japanese and American English. Japanese and American actors made vocal and facial expression in order to transmit six basic emotions- happiness, surprise, anger, disgust, fear, and sadness. Three presentation conditions were used-auditory, visual, and audio-visual. The audio-visual stimuli were made by dubbing the auditory stimuli on to the visual stimuli. The results show: (1) subjects can more easily recognize the vocal expression of a speaker who belongs to their own culture, (2) Japanese subjects are not good at recognizing 'fear' in both the auditory-alone and visual-alone conditions, (3) and both Japanese and American subjects identify the audio-visually incongruent stimuli more often as a visual label rather than as an auditory label. These results suggest that it is difficult to identify the emotion of a speaker from a different culture and that people will predominantly use visual information to identify emotion.

Full Paper

Bibliographic reference.  Shigeno, Sumi (1998): "Cultural similarities and differences in the recognition of audio-visual speech stimuli", In ICSLP-1998, paper 1057.