13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Text-To-Speech Intelligibility Across Speech Rates

Ann K. Syrdal (1), H. Timothy Bunnell (2), Susan R. Hertz (3), Taniya Mishra (1), Murray Spiegel (4), Corine Bickley (5), Deborah Rekart (6), Matthew J. Makashay (7)

(1) AT&T Labs – Research, Florham Park, NJ, USA
(2) Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Wilmington, DE, USA
(3) NovaSpeech LLC and Dept. of Linguistics, Cornell University, Ithaca, NY, USA
(4) Speech Applications Research, Applied Communication Sciences, Piscataway, NJ, USA
(5) Dept. of Hearing, Speech and Language Sciences, Gallaudet University, Washington, DC, USA
(6) AT&T Services, Dallas, TX, USA
(7) Audiology & Speech Center, Walter Reed National Military Medical Center, Bethesda, MD, USA

A web-based listening test measured intelligibility across speech rate of 8 TTS systems and a linearly time-compressed human speech reference voice. Four synthesis methods were compared: formant, diphone concatenation, unit selection concatenation, and HMM synthesis. For each TTS method, a female and a male American English voice from each of 2 independent synthesis engines were tested. Semantically unpredictable sentences were presented at 6 speech rates from 200 to 450 words per minute. In an open response format, listeners typed what they heard. Listener transcriptions were automatically scored at the word level, and a normalized edit distance per speech rate was calculated for each of 355 listeners. There were significant differences among the TTS systems. The two unit selection TTS systems were the most intelligible across speech rates; one was equivalent to human speech. Listeners' native language, TTS familiarity, and audio equipment were also significant factors.

Index Terms: speech synthesis, text-to-speech, intelligibility, speech rate

Full Paper

Bibliographic reference.  Syrdal, Ann K. / Bunnell, H. Timothy / Hertz, Susan R. / Mishra, Taniya / Spiegel, Murray / Bickley, Corine / Rekart, Deborah / Makashay, Matthew J. (2012): "Text-to-speech intelligibility across speech rates", In INTERSPEECH-2012, 623-626.