Speech Prosody 2010

Chicago, IL, USA
May 10-14, 2010

Semi-Supervised Learning of Acoustic Driven Prosodic Phrase Breaks for Text-to-Speech Systems

Kishore Prahallad (1,2), E. Veera Raghavendra (1), Alan W. Black (2)

(1) International Institute of Information Technology, Hyderabad, India.
(2) Language Technologies Institute, Carnegie Mellon University, USA.

In this paper, we propose a semi-supervised learning of acoustic driven phrase breaks and its usefulness for text-to-speech systems. In this work, we derive a set of initial hypothesis of phrase breaks in a speech signal using pause as an acoustic cue. As these initial estimates are obtained based on knowledge of speech production and speech signal processing, one could treat the hypothesized phrase break regions as labeled data. Features such as duration, F0 and energy are extracted from these labeled regions and a machine learning model is trained to perform the classification of these acoustic features as belonging to the class of a phrase break or not a phrase break. We then attempt to bootstrap the machine learning model using unlabeled data (i.e., the rest of the data).

Index Terms: speech synthesis, acoustic driven phrasing, semisupervised

Full Paper

Bibliographic reference.  Prahallad, Kishore / Raghavendra, E. Veera / Black, Alan W. (2010): "Semi-supervised learning of acoustic driven prosodic phrase breaks for text-to-speech systems", In SP-2010, paper 151.