Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

The Vocal Joystick Data Collection Effort and Vowel Corpus

Kelley Kilanski, Jonathan Malkin, Xiao Li, Richard Wright, Jeff A. Bilmes

University of Washington, USA

Vocal Joystick is a mechanism that enables individuals with motor impairments to make use of vocal parameters to control objects on a computer screen (buttons, sliders, etc.) and ultimately will be used to control electro-mechanical instruments (e.g., robotic arms, wireless home automation devices). In an effort to train the VJ-system, speech data from the TIMIT speech corpus was initially used. However, due to problematic issues with co-articulation, we began a large data collection effort in a controlled environment that would not only address the problematic issues, but also yield a new vowel corpus that was representative of the utterances a user of the VJ-system would use. The data collection process evolved over the course of the effort as new parameters were added and as factors relating to the quality of the collected data in terms of the specified parameters were considered. The result of the data collection effort is a vowel corpus of approximately 11 hours of recorded data comprised of approximately 23500 sound files of the monophthongs and vowel combinations (e.g. diphthongs) chosen for the Vocal Joystick project varying along the parameters of duration, intensity and amplitude. This paper discusses how the data collection has evolved since its initiation and provides a brief summary of the resulting corpus.

Bibliographic reference.  Kilanski, Kelley / Malkin, Jonathan / Li, Xiao / Wright, Richard / Bilmes, Jeff A. (2006): "The vocal joystick data collection effort and vowel corpus", In INTERSPEECH-2006, paper 1885-Tue2WeO.2.