Auditory-Visual Speech Processing (AVSP) 2011

Volterra, Italy
September 1-2, 2011

Improved Detection of Ball Hit Events in a Tennis Game Using Multimodal Information

Qiang Huang (1), Stephen Cox (1), Fei Yan (2), Teo de Campos (2), David Windridge (2), Josef Kittler (2), William Christmas (2)

(1) School of Computing Sciences Centre for Vision, Speech and Signal Processing
(2) University of East Anglia, Norwich, UK University of Surrey, Guildford, UK

We describe a novel framework to detect ball hits in a tennis game by combining audio and visual information. Ball hit detection is a key step in understanding a game such as tennis, but single-mode approaches are not very successful: audio detection suffers from interfering noise and acoustic mismatch, video detection is made difficult by the small size of the ball and the complex background of the surrounding environment. Our goal in this paper is to improve detection performance by focusing on high-level information (rather than low-level features), including the detected audio events, the ball’s trajectory, and inter-event timing information. Visual information supplies coarse detection of the ball-hits events. This information is used as a constraint for audio detection. In addition, useful gains in detection performance can be obtained by using and inter-ballhit timing information, which aids prediction of the next ball hit. This method seems to be very effective in reducing the interference present in low-level features. After applying this method to a women’s doubles tennis game, we obtained improvements in the F-score of about 30% (absolute) for audio detection and about 10% for video detection.

Index Terms. Scene analysis, multimodal information integration

Full Paper

Bibliographic reference.  Huang, Qiang / Cox, Stephen / Yan, Fei / Campos, Teo de / Windridge, David / Kittler, Josef / Christmas, William (2011): "Improved detection of ball hit events in a tennis game using multimodal information", In AVSP-2011, 127-130.