On the quality of an expressive audiovisual corpus: a case study of acted speech

Slim Ouni, Sara Dahmani, Vincent Colotte


In the context of developing an expressive audiovisual speech synthesis system, the quality of the audiovisual corpus from which the 3D visual data will be extracted is important. In this paper, we present a perceptive case study on the quality of the expressiveness of a set of emotions acted by a semi-professional actor. We have analyzed the production of this actor pronouncing a set of sentences with acted emotions, during a human emotion-recognition task. We have observed different modalities: audio, real video, 3D-extracted data, as unimodal presentations and bimodal presentations (with audio). The results of this study show the necessity of such perceptive evaluation prior to further exploitation of the data for the synthesis system. The comparison of the modalities shows clearly what the emotions are, that need to be improved during production and how audio and visual components have a strong mutual influence on emotional perception.


 DOI: 10.21437/AVSP.2017-11

Cite as: Ouni, S., Dahmani, S., Colotte, V. (2017) On the quality of an expressive audiovisual corpus: a case study of acted speech. Proc. The 14th International Conference on Auditory-Visual Speech Processing, 53-57, DOI: 10.21437/AVSP.2017-11.


@inproceedings{Ouni2017,
  author={Slim Ouni and Sara Dahmani and Vincent Colotte},
  title={ On the quality of an expressive audiovisual corpus: a case study of acted speech},
  year=2017,
  booktitle={Proc. The 14th International Conference on Auditory-Visual Speech Processing},
  pages={53--57},
  doi={10.21437/AVSP.2017-11},
  url={http://dx.doi.org/10.21437/AVSP.2017-11}
}