Learning to Determine Who is the Better Speaker

Timo Baumann


Speech can be more or less likable in various ways and comparing speakers by likability has important applications such as speaker selection or matching. Determining the likability of a speaker is a difficult task which can be simplified by breaking it down into pairwise preference decisions. Using a corpus of 5440 pairwise preference ratings collected previously through crowd-sourcing, we train classifiers to determine which of two speakers is “better”. We find that modeling the speech feature sequences using LSTMs outperforms conventional methods that pre-aggregate feature averages by a large margin, indicating that the prosodic structure should be taken into account when determining speech quality. Our classifier reaches an accuracy of 97 % for coarse-grained decisions, where differences between speech quality in both stimuli is relatively large.


 DOI: 10.21437/SpeechProsody.2018-165

Cite as: Baumann, T. (2018) Learning to Determine Who is the Better Speaker. Proc. 9th International Conference on Speech Prosody 2018, 819-822, DOI: 10.21437/SpeechProsody.2018-165.


@inproceedings{Baumann2018,
  author={Timo Baumann},
  title={Learning to Determine Who is the Better Speaker},
  year=2018,
  booktitle={Proc. 9th International Conference on Speech Prosody 2018},
  pages={819--822},
  doi={10.21437/SpeechProsody.2018-165},
  url={http://dx.doi.org/10.21437/SpeechProsody.2018-165}
}