Evaluations of algorithms for robust automatic speech recognition (ASR) are often based on artificial noisy speech instead of realistic noisy speech. In this paper we compare the ASR performance of speech with artificial additive noise to the performance of realistic noisy speech. All data was recorded during the same recording campaign and with nearly identical channel characteristics. The simulation process takes into account all major characteristics of the noisy reference data. Clean speech, noisy speech and simulated speech are compared for different aspects of robust ASR including noise reduction by Spectral Subtraction and the ETSI robust front end. The results show, that artificial noisy speech even in very controlled simulation environments is not very similar and not a full substitute for realistic noisy data. While the tendencies of the improvement for artificial and realistic data are similar for the evaluated approaches, the magnitude can be quite different.
Bibliographic reference. Winkler, Thomas (2011): "How realistic is artificially added noise?", In INTERSPEECH-2011, 2605-2608.