International Workshop on Spoken Language Translation (IWSLT) 2011

San Francisco, CA, USA
December 8-9, 2011

Data-Intensive Approaches for ASR

Sadaoki Furui

Tokyo Institute of Technology, Japan

Although many important scientific advances have taken place in automatic speech recognition research, we have also encountered a number of practical limitations which hinder a widespread deployment of applications and services. In most speech recognition tasks, human subjects produce one to two orders of magnitude fewer errors than machines. One of the most significant differences exists in that human subjects are far more flexible and adaptive than machines against various variations of speech, including individuality, speaking style, additive noise, and channel distortions. How to train and adapt statistical models for speech recognition using a limited amount of data is one of the most important research issues.
   What we know about human speech processing and the natural variation of speech is very limited. It is important to spend more effort to clarify especially the mechanism underlying speaker-to-speaker variability, and devise a method for simultaneously modeling multiple sources of variations based on statistical analysis using large-scale databases. Future systems need to have an efficient way of representing, storing, and retrieving various knowledge resources.
   Data-intensive science is rapidly emerging in scientific and computing research communities. The size of speech databases/corpora used in ASR research and development is typically 100 to 1,000 hours of utterances, which is too small considering the variety of sources of variations. We need to focus on solving various problems before efficiently constructing and utilizing huge speech databases, which will be essential to next-generation ASR systems.

Bibliographic reference.  Furui, Sadaoki (2011): "Data-intensive approaches for ASR", In IWSLT-2011 (abstract).