INTERSPEECH 2006 - ICSLP
Multi-stream based automatic speech recognition (ASR) systems are usually shown to outperform single stream systems, specially in noisy test conditions. And, indeed, there is a trend today in ASR towards using more and more acoustic features combined at the input (early integration, possibly preceded by some linear or nonlinear transformation) or later in the recognition process (e.g., at the level of likelihoods, then referred to as late integration). However, to guarantee optimal exploitation of such multi-stream systems, we need to use features that are as much complementary as possible, while also using the best combination method for those streams. In practice, it is never clear whether we fully exploit the potential of the available streams. This present paper investigates an ‘oracle’ test to provide some insight in these issues. Although not providing us with an absolute performance upper bound, oracle is shown to indicate the complimentary of the feature streams used, and to provide a reasonable reference target to evaluate combination strategies. The oracle analysis is supported by results obtained on Numbers95 database using different feature streams and entropy based combination method.
Bibliographic reference. Misra, Hemant / Vepa, Jithendra / Bourlard, Hervé (2006): "Multi-stream ASR: an oracle perspective", In INTERSPEECH-2006, paper 1663-Thu2CaP.3.