International Workshop on Spoken Language Translation (IWSLT) 2011
San Francisco, CA, USA
Although many important scientific advances have taken place in automatic speech
recognition research, we have also encountered a number of practical limitations which hinder
a widespread deployment of applications and services. In most speech recognition tasks,
human subjects produce one to two orders of magnitude fewer errors than machines. One of
the most significant differences exists in that human subjects are far more flexible and
adaptive than machines against various variations of speech, including individuality, speaking
style, additive noise, and channel distortions. How to train and adapt statistical models for
speech recognition using a limited amount of data is one of the most important research
What we know about human speech processing and the natural variation of speech is very limited. It is important to spend more effort to clarify especially the mechanism underlying speaker-to-speaker variability, and devise a method for simultaneously modeling multiple sources of variations based on statistical analysis using large-scale databases. Future systems need to have an efficient way of representing, storing, and retrieving various knowledge resources.
Data-intensive science is rapidly emerging in scientific and computing research communities. The size of speech databases/corpora used in ASR research and development is typically 100 to 1,000 hours of utterances, which is too small considering the variety of sources of variations. We need to focus on solving various problems before efficiently constructing and utilizing huge speech databases, which will be essential to next-generation ASR systems.
Bibliographic reference. Furui, Sadaoki (2011): "Data-intensive approaches for ASR", In IWSLT-2011 (abstract).