Speech Prosody 2010

Chicago, IL, USA
May 10-14, 2010

Integrating a Fast Speech Corpus in Unit Selection Speech Synthesis: Experiments on Perception, Segmentation, and Duration Prediction

Donata Moers (1,2), Petra Wagner (2), Bernd Möbius (1,3), Filip Müllers (1), Igor Jauk (1)

(1) Institut für Kommunikationswissenschaften, Abt. Sprache und Kommunikation, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany
(2) Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, Germany
(3) Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, Germany

This paper examines viable paths for integrating a fast speech corpus into a unit selection synthesis system. After selecting a suitable speaker, two inventories were recorded: one at normal and one at fast speech rate articulated as accurately as possible. A perceptual evaluation showed that for ultra fast speech rate, stimuli generated from fast utterances were judged to be as intelligible as stimuli generated from normal rate utterances; moreover, they were clearly preferred with respect to naturalness. Based on the results of an automatic phone segmentation, which produced only marginal differences in label timing accuracy, CART based duration prediction models for both corpora were built. Prediction accuracy was very similar. We conclude that automatic phone segmentation and CART based duration prediction are applicable to both normal and fast rate recordings.

Index Terms: fast speech, unit selection, duration prediction

Full Paper

Bibliographic reference.  Moers, Donata / Wagner, Petra / Möbius, Bernd / Müllers, Filip / Jauk, Igor (2010): "Integrating a fast speech corpus in unit selection speech synthesis: experiments on perception, segmentation, and duration prediction", In SP-2010, paper 189.