13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Less Errors with TTS? A Dictation Experiment with Foreign Language Learners

Thomas Pellegrini (1), Ângela Costa (1,2), Isabel Trancoso (1,3)

(1) Instituto de Engenharia de Sistemas e Computadores - Investigação e Desenvolvimento, Lisbon, Portugal
(2) Universidade Nova de Lisboa, Lisbon, Portugal
(3) Instituto Superior Técnico, Lisbon, Portugal

This article reports a contrastive study about the use of Text-To-Speech (TTS) synthesis instead of pre-recorded utterances in a dictation exercise submitted to students of European Portuguese as a second language (PSL). Fourty sentences were extracted from a PSL student book. Twenty of them were synthesized and the other twenty ones directly taken from the pre-recorded audio documents of the book. The learners were asked to orthographically transcribe the audio sentences presented in a random order. It appeared that the synthetic utterances were easier to transcribe than the human ones, with word error rates of 26.6% and 33.9% respectively. This result was somehow surprising since the synthetic voice was not built for learning purposes. Potential explaining factors were the lower speech rate and the less-reduced pronunciation that characterized the TTS voice.

Index Terms: Computer-Assisted Language Learning, speech synthesis, dictation, European Portuguese

Full Paper

Bibliographic reference.  Pellegrini, Thomas / Costa, Ângela / Trancoso, Isabel (2012): "Less errors with TTS? a dictation experiment with foreign language learners", In INTERSPEECH-2012, 1291-1294.