International Conference on Auditory-Visual Speech Processing 2008
Tangalooma Wild Dolphin Resort,
Moreton Island, Queensland, Australia
This paper introduces incoming evaluation frameworks for bimodal speech recognition in noisy conditions and real environments. In order to develop a robust speech recognition in noisy environments, bimodal speech recognition which uses acoustic and visual information has been paid attention to particularly for this decade. As a lot of methods and techniques for bimodal speech recognition have been proposed, a common evaluation framework, including audio-visual speech data and baseline system, is needed to estimate and compare these techniques and bimodal speech recognition schemes. Audio-visual evaluation frameworks, CENSREC-1-AV and CENSREC-2-AV, have been being built by the CENSREC project in Japan; CENSREC- 1-AV includes artificially noise-added waveforms and image sequences, whereas CENSREC-2-AV consists of audio-visual data recorded in in-car environments. A baseline method and its recognition results will be also provided with these corpora. Index Terms: evaluation framework, audio-visual speech corpus, bimodal speech recognition, noisy environments.
Bibliographic reference. Tamura, Satoshi / Miyajima, Chiyomi / Kitaoka, Norihide / Hayamizu, Satoru / Takeda, Kazuya (2008): "CENSREC-AV: evaluation frameworks for audio-visual speech recognition", In AVSP-2008, 51-54.