Bag-of-Acoustic-Words for Mental Health Assessment: A Deep Autoencoding Approach

Wenchao Du, Louis-Philippe Morency, Jeffrey Cohn, Alan W. Black

Despite the recent success of deep learning, it is generally difficult to apply end-to-end deep neural networks to small datasets, such as those from the health domain, due to the tendency of neural networks to over-fit. In addition, how neural models reach their decisions is not well understood. In this paper, we present a two-stage approach to acoustic-based classification of behavior markers related to mental health disorders: first, a dictionary and the mapping from speech signals to the dictionary are learned jointly by a deep autoencoder, then the bag-of-words representation of speech is used for classification, using classifiers with simple decision boundaries. This deep bag-of-features approach has the advantage of offering more interpretability, while the use of deep autoencoder gains improvements in prediction by learning higher level features with long range dependencies, comparing to previous work using only low-level descriptors. In addition, we demonstrate the use of labeled emotion recognition data from other domains to supervise acoustic word encoding in order to help predict psychological traits. Experiments are conducted on audio recordings of 65 clinically recorded interviews with the self-reported level of post-traumatic stress disorder (PTSD), depression, and rapport with the interviewers.

 DOI: 10.21437/Interspeech.2019-3059

Cite as: Du, W., Morency, L., Cohn, J., Black, A.W. (2019) Bag-of-Acoustic-Words for Mental Health Assessment: A Deep Autoencoding Approach. Proc. Interspeech 2019, 1428-1432, DOI: 10.21437/Interspeech.2019-3059.

  author={Wenchao Du and Louis-Philippe Morency and Jeffrey Cohn and Alan W. Black},
  title={{Bag-of-Acoustic-Words for Mental Health Assessment: A Deep Autoencoding Approach}},
  booktitle={Proc. Interspeech 2019},