Very Short-Term Conflict Intensity Estimation Using Fisher Vectors

Gábor Gosztolya


The automatic detection of conflict situations from human speech has several applications like obtaining feedback of employees in call centers, the surveillance of public spaces, and other roles in human-computer interactions. Although several methods have been developed to automatic conflict detection, they were designed to operate on relatively long utterances. In practice, however, it would be beneficial to process much shorter speech segments. With the traditional workflow of paralinguistic speech processing, this would require properly annotated training and testing material consisting of short clips. In this study we show that Support Vector Regression machine learning models using Fisher vectors as features, even when trained on longer utterances, allow us to efficiently and accurately detect conflict intensity from very short audio segments. Even without having reliable annotations of these such short chunks, the mean scores of the predictions corresponding to short segments of the same original, longer utterances correlate well to the reference manual annotation. We also verify the validity of this approach by comparing the SVM predictions of the chunks with a manual annotation for the full and the 5-second-long cases. Our findings allow the construction of conflict detection systems having smaller delay, therefore being more useful in practice.


 DOI: 10.21437/Interspeech.2020-2349

Cite as: Gosztolya, G. (2020) Very Short-Term Conflict Intensity Estimation Using Fisher Vectors. Proc. Interspeech 2020, 3127-3131, DOI: 10.21437/Interspeech.2020-2349.


@inproceedings{Gosztolya2020,
  author={Gábor Gosztolya},
  title={{Very Short-Term Conflict Intensity Estimation Using Fisher Vectors}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3127--3131},
  doi={10.21437/Interspeech.2020-2349},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2349}
}