Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System

Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Toshio Irino


The measurement of speech intelligibility (SI) still mainly relies on time-consuming and expensive subjective experiments because no versatile objective measure can predict SI. One promising candidate of an SI prediction method is an approach with a deep neural network (DNN)-based automatic speech recognition (ASR) system, due to its recent great advance. In this paper, we propose and evaluate SI prediction methods based on the posteriors of DNN-based ASR systems. Posteriors, which are the probabilities of phones given acoustic features, are derived using forced alignments between clean speech and a phone sequence. We evaluated some variations of the posteriors to improve the prediction performance. As a result of our experiments, a prediction method using a squared cumulative posterior probability achieved better accuracy than the conventional SI predictors based on well-established objective measures (STOI and eSTOI).


 DOI: 10.21437/Interspeech.2020-1591

Cite as: Arai, K., Araki, S., Ogawa, A., Kinoshita, K., Nakatani, T., Irino, T. (2020) Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System. Proc. Interspeech 2020, 1156-1160, DOI: 10.21437/Interspeech.2020-1591.


@inproceedings{Arai2020,
  author={Kenichi Arai and Shoko Araki and Atsunori Ogawa and Keisuke Kinoshita and Tomohiro Nakatani and Toshio Irino},
  title={{Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1156--1160},
  doi={10.21437/Interspeech.2020-1591},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1591}
}