Speech Prosody 2008

Campinas, Brazil
May 6-9, 2008

Data-Driven Unsupervised Adaptation of Acoustic-Prosodic Models

Sankaranarayanan Ananthakrishnan, Shrikanth Narayanan

Speech Analysis and Interpretation Laboratory, Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA

Training categorical prosody models for spoken language systems requires a significant amount of speech data annotated with the discrete labels of interest (such as boundary marks or word prominence information). In practice, the difficulty and expense incurred in producing corpora with rich prosodic transcriptions severely limits their integration within applications. In this paper, we explore the possibility of using a large, unlabeled corpus to adapt, in an unsupervised fashion, acousticprosodic models trained from a small, human-annotated seed dataset. Our experiments show that the proposed adaptation scheme improves the ability of the acoustic-prosodic model to distinguish between prosodic categories. On a test set derived from the Boston University Radio News Corpus, the adapted models reduced pitch accent detection error rate by 4.3% relative to the seed acoustic-prosodic models trained from the annotated data.

Full Paper

Bibliographic reference.  Ananthakrishnan, Sankaranarayanan / Narayanan, Shrikanth (2008): "Data-driven unsupervised adaptation of acoustic-prosodic models", In SP-2008, 161-164.