A Comparison of Sentence-Level Speech Intelligibility Metrics

Alexander Kain, Max Del Giudice, Kris Tjaden


We examine existing and novel automatically-derived acoustic metrics that are predictive of speech intelligibility. We hypothesize that the degree of variability in feature space is correlated with the extent of a speaker’s phonemic inventory, their degree of articulatory displacements, and thus with their degree of perceived speech intelligibility. We begin by using fully-automatic F1/F2 formant frequency trajectories for both vowel space area calculation and as input to a proposed class-separability metric. We then switch to representing vowels by means of short-term spectral features, and measure vowel separability in that space. Finally, we consider the case where phonetic labeling is unavailable; here we calculate short-term spectral features for the entire speech utterance and then estimate their entropy based on the length of a minimum spanning tree. In an alternative approach, we propose to first segment the speech signal using a hidden Markov model, and then calculate spectral feature separability based on the automatically-derived classes. We apply all approaches to a database with healthy controls as well as speakers with mild dysarthria, and report the resulting coefficients of determination.


 DOI: 10.21437/Interspeech.2017-567

Cite as: Kain, A., Giudice, M.D., Tjaden, K. (2017) A Comparison of Sentence-Level Speech Intelligibility Metrics. Proc. Interspeech 2017, 1148-1152, DOI: 10.21437/Interspeech.2017-567.


@inproceedings{Kain2017,
  author={Alexander Kain and Max Del Giudice and Kris Tjaden},
  title={A Comparison of Sentence-Level Speech Intelligibility Metrics},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1148--1152},
  doi={10.21437/Interspeech.2017-567},
  url={http://dx.doi.org/10.21437/Interspeech.2017-567}
}