Automatic, model-based detection of pause-less phrase boundaries from fundamental frequency and duration features

Mahsa Sadat Elyasi Langarani, Jan van Santen


Prosodic phrase boundaries (PBs) are a key aspect of spoken communication. In automatic PB detection, it is common to use local acoustic features, textual features, or a combination of both. Most approaches – regardless of features used – succeed in detecting major PBs (break score “4” in ToBI annotation, typically involving a pause) while detection of intermediate PBs (break score “3” in ToBI annotation) is still challenging. In this study we investigate the detection of intermediate, “pauseless” PBs using prosodic models, using a new corpus characterized by strong prosodic dynamics and an existing (CMU) corpus. We show how using duration and fundamental frequency modeling can improve detection of these PBs, as measured by the F1 score, compared to Festival, which only uses textual features to detect PBs. We believe that this study contributes to our understanding of the prosody of phrase breaks.


DOI: 10.21437/SSW.2016-1

Cite as

Elyasi Langarani, M.S., van Santen, J. (2016) Automatic, model-based detection of pause-less phrase boundaries from fundamental frequency and duration features. Proc. 9th ISCA Speech Synthesis Workshop, 1-6.

Bibtex
@inproceedings{Elyasi Langarani+2016,
author={Mahsa Sadat Elyasi Langarani and Jan van Santen},
title={Automatic, model-based detection of pause-less phrase boundaries from fundamental frequency and duration features},
year=2016,
booktitle={9th ISCA Speech Synthesis Workshop},
doi={10.21437/SSW.2016-1},
url={http://dx.doi.org/10.21437/SSW.2016-1},
pages={1--6}
}