13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

The "Audio-Visual Face Cover Corpus": Investigations into Audio-Visual Speech and Speaker Recognition When the Speaker's Face is Occluded by Facewear

Natalie Fecher

Department of Language and Linguistic Science, University of York, York, UK

The Audio-Visual Face Cover Corpus consists of high-quality audio and video recordings of 10 native British English speakers wearing different types of 'facewear'. Speakers read aloud a set of 64 /C1VC2/ syllables embedded in a carrier phrase. 18 English consonants occurred twice each in onset and coda positions. Speakers recited the list 1+8 times, i.e. once in control condition (no facewear) and eight times while wearing a forensically-relevant face covering. Audio recordings were made by simultaneously capturing the speech via a headband microphone and two shotgun microphones placed facing and behind the speaker. Footage of the subject's head and shoulders was filmed from two camera angles, frontal and half-profile. In total, 6,120 utterances were recorded per device. This paper aims to specify the database design, to introduce forensic-phonetic research utilising the data, and to demonstrate the corpus's potential applications in related fields of study and in casework conducted by forensic speech scientists.

Index Terms: speech database, audio-visual, forensic speech science, facewear, disguise, acoustic phonetics, perception

Full Paper

Bibliographic reference.  Fecher, Natalie (2012): "The "audio-visual face cover corpus": investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear", In INTERSPEECH-2012, 2250-2253.