International Conference on Auditory-Visual Speech Processing 2008
Tangalooma Wild Dolphin Resort,
Moreton Island, Queensland, Australia
In this study, we measured detection and tolerance thresholds of auditory-visual asynchrony between time-expanded speech and a moving image of the talkerís face. During experiments, words were presented under two conditions: asynchrony by time-expanded speech (expansion condition: EXP) and simple timing shift (asynchronous condition: ASYN). We used 16 Japanese shorter words (four morae) and 20 Japanese longer words (seven or eight morae). All auditory speech was presented in pink noise to avoid the ceiling effect. The SNRs for shorter and longer words were respectively set to -10 dB and -3.5 dB. For EXP, auditory speech signals were analyzed and resynthesized using STRAIGHT to change the wordsí duration (Kawahara et al., 1998). The resynthesized auditory signals were combined with the visual signals so that the onset of the utterance was synchronous. For ASYN, the auditory speech signal was simply lagged behind the visual speech signal. Results showed that detection and tolerance thresholds in longer words were higher than those for shorter words. However, when the threshold was recalculated as a function of the ratio of the expansion rate to word duration, these differences were not observed. These results suggest that detection and tolerance thresholds for auditory-visual asynchrony between timeexpanded speech and a moving image of talkerís face might depend on the ratio of the expansion rate to word duration.
Bibliographic reference. Sakamoto, Shuichi / Tanaka, Akihiro / Numahata, Shun / Imai, Atsushi / Takagi, Tohru / Suzuki, YŰiti (2008): "Effect of audio-visual asynchrony between time-expanded speech and a moving image of a talker≤s face on detection and tolerance thresholds", In AVSP-2008, 79-82.