Non-intrusive Quality Assessment of Synthesized Speech using Spectral Features and Support Vector Regression

Meet H. Soni, Hemant A. Patil


In this paper, we propose a new quality assessment method for synthesized speech. Unlike previous approaches which uses Hidden Markov Model (HMM) trained on natural utterances as a reference model to predict the quality of synthesized speech, proposed approach uses knowledge about synthesized speech while training the model. The previous approach has been successfully applied in the quality assessment of synthesized speech for the German language. However, it gave poor results for English language databases such as Blizzard Challenge 2008 and 2009 databases. The problem of quality assessment of synthesized speech is posed as a regression problem. The mapping between statistical properties of spectral features extracted from the speech signal and corresponding speech quality score (MOS) was found using Support Vector Regression (SVR). All the experiments were done on Blizzard Challenge Databases of the year 2008, 2009, 2010 and 2012. The results of experiments show that by including knowledge about synthesized speech while training, the performance of quality assessment system can be improved. Moreover, the accuracy of quality assessment system heavily depends on the kind of synthesis system used for signal generation. On Blizzard 2008 and 2009 database, proposed approach gives correlation of 0.28 and 0.49, respectively, for about 17 % data used in training. Previous approach gives correlation of 0.3 and 0.09, respectively, using spectral features. For Blizzard 2012 database, proposed approach gives correlation of 0.8 by using 12 % of available data in training.


DOI: 10.21437/SSW.2016-21

Cite as

Soni, M.H., Patil, H.A. (2016) Non-intrusive Quality Assessment of Synthesized Speech using Spectral Features and Support Vector Regression. Proc. 9th ISCA Speech Synthesis Workshop, 127-133.

Bibtex
@inproceedings{Soni+2016,
author={Meet H. Soni and Hemant A. Patil},
title={Non-intrusive Quality Assessment of Synthesized Speech using Spectral Features and Support Vector Regression},
year=2016,
booktitle={9th ISCA Speech Synthesis Workshop},
doi={10.21437/SSW.2016-21},
url={http://dx.doi.org/10.21437/SSW.2016-21},
pages={127--133}
}