Auditory-Visual Speech Processing
Perceptual experiments on audio-visual consonant recognition based on the spectral reduction of the speech (SRS) have been carried out with coherent and incoherent (McGurk) audiovisual pairs. The main interest of SRS in four sub-bands is to have a partial suppression of the information transmitted for the place of articulation. The integration of manner, restricted to the fricative/occlusive contrast, is also of concern, and a new 'cross-manner' combination is tested. As expected, we have a good audio-visual complementarity for SRS and a high amount of McGurk responses, but new interesting effects are observed. For the interpretation of human confusion about place of articulation, the Bayesian model proposed by Massaro and Stork  is compared to a new place identification model which is based on averaging as well as on the separate identification of articulatory features. This decomposition is a promising way for the development of multi-stream speech recognition models.
Bibliographic reference. Berthommier, Frédéric (2001): "Audio-visual recognition of spectrally reduced speech", In AVSP-2001, 183-188.