INTERSPEECH 2006 - ICSLP
Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

A Multilingual Embodied Conversational Agent for Tutoring Speech and Language Learning

Dominic W. Massaro (1), Ying Liu (2), Trevor H. Chen (1), Charles Perfetti (2)

(1) University of California at Santa Cruz, USA; (2) University of Pittsburgh, USA

Speech and language science and technology evolved under the assumption that speech was a solely auditory event. However, a burgeoning record of research findings reveals that our perception and understanding are influenced by a speakerís face and accompanying gestures, as well as the actual sound of the speech. Perceivers expertly use these multiple sources of information to identify and interpret the language input. Given the value of face-to-face interaction, our persistent goal has been to develop, evaluate, and apply animated agents to produce realistic and accurate speech. Baldi is an accurate three-dimensional animated talking head appropriately aligned with either synthesized or natural speech. Baldi has a realistic tongue and palate, which can be shown by making his skin transparent. Based on this research and technology, we have implemented computer-assisted speech and language tutors for children with language challenges and for all persons learning a second language. Our language-training program utilizes Baldi (or his likeness) as the conversational agent, who guides students through a variety of exercises designed to teach vocabulary and grammar, to improve speech articulation, and to develop linguistic and phonological awareness. We have also implemented multilingual agents, using a client/server architecture system. This system has been used to develop Bao, a Mandarin talker, which has been used in an initial training study for college students learning Mandarin as a new language. The results address the potential for using visible speech technology and pedagogy in language learning of both similar segments in the two languages and new speech segments in the new language. Although visible speech did not facilitate pronunciation learning relative to just auditory speech, we expect that a more prolonged training period would show an advantage of visible speech. Some of the advantages of the Baldi pedagogy and technology include the popularity and proven effectiveness of computers and embodied conversational agents, the perpetual availability of the program, and individualized instruction. The science and technology of Baldi holds great promise in language learning, dialog, human-machine interaction, education, and edutainment.

Full Paper

Bibliographic reference.  Massaro, Dominic W. / Liu, Ying / Chen, Trevor H. / Perfetti, Charles (2006): "A multilingual embodied conversational agent for tutoring speech and language learning", In INTERSPEECH-2006, paper 1313-Tue1WeS.3.