Mining Multimodal Repositories for Speech Affecting Diseases

Joana Correia, Bhiksha Raj, Isabel Trancoso, Francisco Teixeira

The motivation for this work is to contribute to the collection of large in-the-wild multimodal datasets in which the speech of the subject is affected by certain medical conditions. Our mining effort is focused on video blogs (vlogs) and as a proof-of-concept we have selected three target diseases: Depression, Parkinson's disease and cold. Given the large scale nature of the online repositories, we take advantage of existing retrieval algorithms to narrow the pool of candidate videos for a given query related with the disease (e.g. depression vlog) and on top of that we apply several filtering techniques. These techniques explore both audio, video, text and metadata cues, in order to retrieve vlogs that include a single speaker which, at some point, admits that he/she is currently affected by a given disease. The use of straightforward NLP techniques on the automatically transcribed data showed that distinguishing between narratives of present and past experiences is harder than distinguishing between narratives of self experiences and of someone else's. The three resulting speech datasets were tested with neural networks trained with speech data collected in controlled conditions, yielding results only slightly below the ones achieved with the original test datasets.

 DOI: 10.21437/Interspeech.2018-1806

Cite as: Correia, J., Raj, B., Trancoso, I., Teixeira, F. (2018) Mining Multimodal Repositories for Speech Affecting Diseases. Proc. Interspeech 2018, 2963-2967, DOI: 10.21437/Interspeech.2018-1806.

  author={Joana Correia and Bhiksha Raj and Isabel Trancoso and Francisco Teixeira},
  title={Mining Multimodal Repositories for Speech Affecting Diseases},
  booktitle={Proc. Interspeech 2018},