Thin slicing to predict viewer impressions of TED Talks

Ailbhe Cullen, Naomi Harte


Many paralinguistic challenges have looked at predicting affect, speaker state, or other attributes from short segments of speech of less than a minute. There are situations however, where we want to predict how a user might label a talk or lecture of significantly longer duration. For example, would a viewer find a given talk funny? The question then is how to map long talks to single word labels? In this paper, we rely on the concept of thin slicing, which states that humans make similar judgements on short segments of speech as they do on longer segments. We wish to find short segments that are representative of the talk, which can be used to predict the user label. We explore this concept in order to predict user ratings of TED talks as inspiring, persuasive, and funny. In particular, we pose two questions. The first is how thin can we make our slices? Results show that longer slices, of up to a minute in duration are more useful for the prediction of viewer ratings. We also ask where the best position to slice the video is? We compare the performance of classification based on slices extracted from fixed points to that of slices extracted from salient regions, and find that prediction accuracy can be improved by choosing slices according to the speaker’s vocal behaviour or the audience’s reactions.


 DOI: 10.21437/AVSP.2017-12

Cite as: Cullen, A., Harte, N. (2017) Thin slicing to predict viewer impressions of TED Talks. Proc. The 14th International Conference on Auditory-Visual Speech Processing, 58-63, DOI: 10.21437/AVSP.2017-12.


@inproceedings{Cullen2017,
  author={Ailbhe Cullen and Naomi Harte},
  title={ Thin slicing to predict viewer impressions of TED Talks},
  year=2017,
  booktitle={Proc. The 14th International Conference on Auditory-Visual Speech Processing},
  pages={58--63},
  doi={10.21437/AVSP.2017-12},
  url={http://dx.doi.org/10.21437/AVSP.2017-12}
}