Auditory-Visual Speech Processing (AVSP) 2010
Hakone, Kanagawa, Japan
Humans use speech to convey information; attract attention; express affect, etc. Speech register research shows that humans are adept at fine-tuning components of their speech to accommodate the needs of their audience, suggesting that they have a model of others communication needs. However, when that audience is a computer rather than another human, such a model may be invalid and speech adaptations, Computer-Directed Speech, may be inappropriate. Here we examine humans speech to other humans or an auditoryvisual avatar before and after the computer makes a listening error. Vowel durations are found to be longer in Computerthan Human-Directed Speech (especially in speech repairs after computer errors), and there is greater vowel hyperarticulation in Computer- than Human-Directed Speech both before and after error correction. The results are discussed in terms of human-computer interaction (HCI), talking head applications and ASR systems.
Index Terms: computer-directed speech, speech repairs, vowel hyperarticulation, human-computer interaction.
Bibliographic reference. Burnham, Denis / Joeffry, Sebastian / Rice, Lauren (2010): ""d-o-e-s-not-c-o-m-p-u-t-e: vowel hyperarticulation in speech to an auditory-visual avatar", In AVSP-2010, paper P18.