Optimizing Speech Recognition Evaluation Using Stratified Sampling

Janne Pylkkönen, Thomas Drugman, Max Bisani

Producing large enough quantities of high-quality transcriptions for accurate and reliable evaluation of an automatic speech recognition (ASR) system can be costly. It is therefore desirable to minimize the manual transcription work for producing metrics with an agreed precision. In this paper we demonstrate how to improve ASR evaluation precision using stratified sampling. We show that by altering the sampling, the deviations observed in the error metrics can be reduced by up to 30% compared to random sampling, or alternatively, the same precision can be obtained on about 30% smaller datasets. We compare different variants for conducting stratified sampling, including a novel sample allocation scheme tailored for word error rate. Experimental evidence is provided to assess the effect of different sampling schemes to evaluation precision.

DOI: 10.21437/Interspeech.2016-1364

Cite as

Pylkkönen, J., Drugman, T., Bisani, M. (2016) Optimizing Speech Recognition Evaluation Using Stratified Sampling. Proc. Interspeech 2016, 3106-3110.

author={Janne Pylkkönen and Thomas Drugman and Max Bisani},
title={Optimizing Speech Recognition Evaluation Using Stratified Sampling},
booktitle={Interspeech 2016},