4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
It is a well-established fact that human performance exceeds that of computers by orders of magnitude on a wide range of speech recognition tasks. However, there is widespread belief that the gap between human and machine performance has narrowed considerably on restricted problems. Yet, there are few extensive comparisons of performance on tasks involving large vocabulary continuous speech recognition (LVCSR) and low signal-to-noise ratios (SNRs). Human evaluations on LVCSR tasks highlight a number of interesting issues. For example, familiarity with the domain plays a crucial role in human performance. We conducted several experiments that extensively characterize human performance on LVCSR tasks over two standard evaluation corpora - ARPA’s CSR’94 Spoke 10 and CSR’95 Hub 3. We demonstrate that human performance is at least an order of magnitude better than the best machine performance, and that human performance is fairly robust to a number of factors that typically degrade machine performance: SNR, speaking rate and style, microphone and ambient noise. In fact, human performance remained remarkably consistent across evaluation paradigms, and to some extent was artificially limited by a listener’s attention span.
Bibliographic reference. Deshmukh, N. / Duncan, R. J. / Ganapathiraju, A. / Picone, J. (1996): "Benchmarking human performance for continuous speech recognition", In ICSLP-1996, 2486-2489.