Fourth ISCA ITRW on Speech Synthesis

August 29 - September 1, 2001
Perthshire, Scotland

Prosodic Phrasing: Machine and Human Evaluation

M. Céu Viana, Luis C. Oliveira, and Ana I. Mata

(1) CLUL, (2) INESC-ID/OST, (3) FLUL-CLUL, Lisbon, Portugal

In this paper we describe a set of experiments aiming at building and evaluating a new phrasing module for European Portuguese Text-to-Speech Synthesis, using Classification and Regression Tree (CART) techniques on hand-labeled texts. Using the assessment criteria of matching boundary predictions against a reference example of phrased sentences, the best solution found up to now achieves an overall performance of 91.9%, with 86.3% of breaks correctly assigned and 4,3% of false insertions. Although in absolute terms such scores may be considered surprisingly good considering the size of the training set, the total number of exact matches at the sentence level is much lower. This suggested a more formal experiment to test the acceptability of the predicted phrasing in the judgment of human evaluators. The experiment involved 99 participants that were asked to grade both the predicted and reference phrasing, and to also express their opinion on where should the breaks be placed. The results showed that, as expected, there is a large variability among the subjects in the acceptance of a specific partitioning. However the performance of the automatic assignment procedure is better rated by human evaluators.

