A Path Signature Approach for Speech Emotion Recognition

Bo Wang, Maria Liakata, Hao Ni, Terry Lyons, Alejo J. Nevado-Holgado, Kate Saunders

Automatic speech emotion recognition (SER) remains a difficult task within human-computer interaction, despite increasing interest in the research community. One key challenge is how to effectively integrate short-term characterisation of speech segments with long-term information such as temporal variations. Motivated by the numerical approximation theory of stochastic differential equations (SDEs), we propose the novel use of path signatures. The latter provide a pathwise definition to solve SDEs, for the integration of short speech frames. Furthermore we propose a hierarchical tree structure of path signatures, to capture both global and local information. A simple tree-based convolutional neural network (TBCNN) is used for learning the structural information stemming from dyadic path-tree signatures. Our experimental results on a widely used benchmark dataset demonstrate comparable performance to complex neural network based systems.

 DOI: 10.21437/Interspeech.2019-2624

Cite as: Wang, B., Liakata, M., Ni, H., Lyons, T., Nevado-Holgado, A.J., Saunders, K. (2019) A Path Signature Approach for Speech Emotion Recognition. Proc. Interspeech 2019, 1661-1665, DOI: 10.21437/Interspeech.2019-2624.

  author={Bo Wang and Maria Liakata and Hao Ni and Terry Lyons and Alejo J. Nevado-Holgado and Kate Saunders},
  title={{A Path Signature Approach for Speech Emotion Recognition}},
  booktitle={Proc. Interspeech 2019},