13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Predicting Character-Appropriate Voices for a TTS-based Storyteller System

Erica Greene (1), Taniya Mishra (2), Patrick Haffner (2), Alistair Conkie (2)

(1) Computer Science Department, The University of Southern California, Los Angeles, CA, USA
(2) AT&T Research, Florham Park, NJ, USA

Using distinct and appropriate synthetic voices to voice the characters in a children's story would make a TTS-based digital storyteller system more engaging and entertaining, and also help listeners comprehend the story better. However, automatically predicting appropriate voices for storybook characters is a non-trivial problem.
   In this paper, we present a data-driven approach towards predicting the most appropriate voices for different characters in children's stories based on salient character attributes. We use Mechanical Turk to identify the character attributes that are most salient in evoking the listeners' perception that a specific character should have a particular voice, and to label the voices in our collection with attribute tags. Naive Bayes was used to model the attribute-to-voice relationship. Our system was evaluated objectively, and significantly above chance results show our approach to be viable.

Index Terms: Speech synthesis, TTS, expressive speech, childdirected speech applications.

Full Paper

Bibliographic reference.  Greene, Erica / Mishra, Taniya / Haffner, Patrick / Conkie, Alistair (2012): "Predicting character-appropriate voices for a TTS-based storyteller system", In INTERSPEECH-2012, 2210-2213.