Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Multi-Stream ASR: An Oracle Perspective

Hemant Misra, Jithendra Vepa, Hervé Bourlard

IDIAP Research Institute, Switzerland

Multi-stream based automatic speech recognition (ASR) systems are usually shown to outperform single stream systems, specially in noisy test conditions. And, indeed, there is a trend today in ASR towards using more and more acoustic features combined at the input (early integration, possibly preceded by some linear or nonlinear transformation) or later in the recognition process (e.g., at the level of likelihoods, then referred to as late integration). However, to guarantee optimal exploitation of such multi-stream systems, we need to use features that are as much complementary as possible, while also using the best combination method for those streams. In practice, it is never clear whether we fully exploit the potential of the available streams. This present paper investigates an ‘oracle’ test to provide some insight in these issues. Although not providing us with an absolute performance upper bound, oracle is shown to indicate the complimentary of the feature streams used, and to provide a reasonable reference target to evaluate combination strategies. The oracle analysis is supported by results obtained on Numbers95 database using different feature streams and entropy based combination method.

Full Paper

Bibliographic reference.  Misra, Hemant / Vepa, Jithendra / Bourlard, Hervé (2006): "Multi-stream ASR: an oracle perspective", In INTERSPEECH-2006, paper 1663-Thu2CaP.3.