Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
Bimodal perception allows a better understanding of speech than audition alone. In this paper, we quantify the intelligibility gain from presenting the speaker's face along with the auditory stimuli, as a function of distortion by added white noise. Eighteen French subjects with good audition and vision were given a closed choice identification test of three vowels [i, a, y] and six consonants [b, v, z, 3, r, 1] under auditory alone and audiovisual presentation conditions. Mean identification scores first give us a measurement of the global improvement provided by bimodal perception: the audio alone identification score decreases from 72% to 8% when S/N decreases from -6 dB to -18 dB, while the audiovisual identification score only decreases from 93% to 77%. Next, the comparison of confusion matrices allows us to discuss the respective effects of each vowel: in audio alone condition of perception, [a] is more intelligible than [i], which is in turn more intelligible than [y], especially under highly degraded acoustic conditions; in bimodal perception, and under highly degraded acoustic conditions (S/N < -18 dB), [yj is more easily identified than [a], which is in turn more easily identified than [i]. Finally, we quantify the importance of the contextual effect of the three vowels on the auditory and the audio-visual intelligibility of the six consonants: [y] distorts both the audio alone and the audiovisual intelligibility of surrounding consonants, [a] improves both, [i] is the vocalic context which best facilitates the visual identification of its surrounding consonants.
Bibliographic reference. Benoît, Christian / Mohamadi, Tayeb (1992): "The lip benefit: auditory and visual intelligibility of French speech in noise", In ICSLP-1992, 951-954.