Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

The Blizzard Challenge: Evaluating Corpus-based Speech Synthesis Techniques

Alan W. Black

Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA

The Blizzard Challenge was started in 2005 as a way to evaluate different corpus speech synthesis techniques on a common data set. It has been noted that it is very hard to evaluate different speech synthesis techniques when different size and quality databases are used to build a voice. To remove the variable of database size and speaker quality, we proposed a common database that all participants would use. The Challenge itself is for participants to take the given database (or databases) and build a voice using their voice building software. After a short time, a set of test sentences are released that are to be synthesized by each participants' system. The synthesized utterances are collected together and a webbased listening test is set up. Two types of listening tests are carried out, a simple MOS based test, and a set of understandability tests where the listener is asked to type in what they hear.

Three sets of listeners are used: speech experts (provided from the participants' groups), volunteers (collect by web advertising), and paid undergraduate native speakers. Each year the results have been presented at a workshop where participants present descriptions of their systems, and final results are given.

The challenge has brought together groups from academia and industry from around the world. Both established groups, and new groups have been represented. The results have been both interesting and unexpected.

But we see the Challenge as a long term evolving event. Modifications in the basic structure are being considered each year. For example: how to test if speaker identity is preserved in voice conversion based systems; how can we test multisentence synthesis; what about multi-lingual databases; and who is going to run it.

No individual results will be presented in this talk, but overall trends will be given as well as discussion of future directions for Blizzard.

A more detailed description of the motivation and details of the challenge is described in [Black and Tokuda 2005]. All the presentations including anonymized results are also available on line at http://festvox.org/blizzard/ .

Reference

Abstract as PDF

Bibliographic reference.  Black, Alan W. (2007): "The Blizzard Challenge: evaluating corpus-based speech synthesis techniques", In SSW6-2007, 392.