Speech Prosody 2008
Training categorical prosody models for spoken language systems requires a significant amount of speech data annotated with the discrete labels of interest (such as boundary marks or word prominence information). In practice, the difficulty and expense incurred in producing corpora with rich prosodic transcriptions severely limits their integration within applications. In this paper, we explore the possibility of using a large, unlabeled corpus to adapt, in an unsupervised fashion, acousticprosodic models trained from a small, human-annotated seed dataset. Our experiments show that the proposed adaptation scheme improves the ability of the acoustic-prosodic model to distinguish between prosodic categories. On a test set derived from the Boston University Radio News Corpus, the adapted models reduced pitch accent detection error rate by 4.3% relative to the seed acoustic-prosodic models trained from the annotated data.
Bibliographic reference. Ananthakrishnan, Sankaranarayanan / Narayanan, Shrikanth (2008): "Data-driven unsupervised adaptation of acoustic-prosodic models", In SP-2008, 161-164.