Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets

Farhad Bin Siddique, Pascale Fung


We propose a multilingual personality classifier that uses text data from social media and Youtube Vlog transcriptions, and maps them into Big Five personality traits using a Convolutional Neural Network (CNN). We first train unsupervised bilingual word embeddings from an English-Chinese parallel corpus, and use these trained word representations as input to our CNN. This enables our model to yield relatively high cross-lingual and multilingual performance on Chinese texts, after training on the English dataset for example. We also train monolingual Chinese embeddings from a large Chinese text corpus and then train our CNN model on a Chinese dataset consisting of conversational dialogue labeled with personality. We achieve an average F-score of 66.1 in our multilingual task compared to 63.3 F-score in cross-lingual, and 63.2 F-score in the monolingual performance.


 DOI: 10.21437/Interspeech.2017-1379

Cite as: Siddique, F.B., Fung, P. (2017) Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets. Proc. Interspeech 2017, 3271-3275, DOI: 10.21437/Interspeech.2017-1379.


@inproceedings{Siddique2017,
  author={Farhad Bin Siddique and Pascale Fung},
  title={Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3271--3275},
  doi={10.21437/Interspeech.2017-1379},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1379}
}