Using Voice Quality Supervectors for Affect Identification

Soo Jin Park, Amber Afshan, Zhi Ming Chua, Abeer Alwan

The voice quality of speech sounds often conveys perceivable information about the speaker’s affect. This study proposes perceptually important voice quality features to recognize affect represented in speech excerpts from individuals with mental, neurological and/or physical disabilities. The voice quality feature set consists of F0, harmonic amplitude differences between the first, second, fourth harmonics and the harmonic near 2 kHz, the center frequency and amplitudes of the first 3 formants and cepstral peak prominence. The feature distribution of each utterance was represented with a supervector and the Gaussian mixture model and support vector machine classifiers were used for affect classification. Similar classification systems using the MFCCs and ComParE16 feature set were implemented. The systems were fused by taking the confidence mean of the classifiers. Applying the fused system to the Interspeech 2018 Atypical Affect subchallenge task resulted in unweighted average recalls of 43.9% and 41.0% on the development and test dataset, respectively. Additionally, we investigated clusters obtained by unsupervised learning to address gender-related differences.

 DOI: 10.21437/Interspeech.2018-1401

Cite as: Park, S.J., Afshan, A., Chua, Z.M., Alwan, A. (2018) Using Voice Quality Supervectors for Affect Identification. Proc. Interspeech 2018, 157-161, DOI: 10.21437/Interspeech.2018-1401.

  author={Soo Jin Park and Amber Afshan and Zhi Ming Chua and Abeer Alwan},
  title={Using Voice Quality Supervectors for Affect Identification},
  booktitle={Proc. Interspeech 2018},