DNN No-Reference PSTN Speech Quality Prediction

Gabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner

Classic public switched telephone networks (PSTN) are often a black box for VoIP network providers, as they have no access to performance indicators, such as delay or packet loss. Only the degraded output speech signal can be used to monitor the speech quality of these networks. However, the current state-of-the-art speech quality models are not reliable enough to be used for live monitoring. One of the reasons for this is that PSTN distortions can be unique depending on the provider and country, which makes it difficult to train a model that generalizes well for different PSTN networks. In this paper, we present a new open-source PSTN speech quality test set with over 1000 crowdsourced real phone calls. Our proposed no-reference model outperforms the full-reference POLQA and no-reference P.563 on the validation and test set. Further, we analyzed the influence of file cropping on the perceived speech quality and the influence of the number of ratings and training size on the model accuracy.

 DOI: 10.21437/Interspeech.2020-2760

Cite as: Mittag, G., Cutler, R., Hosseinkashi, Y., Revow, M., Srinivasan, S., Chande, N., Aichner, R. (2020) DNN No-Reference PSTN Speech Quality Prediction. Proc. Interspeech 2020, 2867-2871, DOI: 10.21437/Interspeech.2020-2760.

  author={Gabriel Mittag and Ross Cutler and Yasaman Hosseinkashi and Michael Revow and Sriram Srinivasan and Naglakshmi Chande and Robert Aichner},
  title={{DNN No-Reference PSTN Speech Quality Prediction}},
  booktitle={Proc. Interspeech 2020},