Analytical Assessment of Dual-Stream Merging for Noise-Robust ASR

Louis ten Bosch, Bert Cranen, Yang Sun

In previous studies (on Aurora2), it was found that merging a posteriori probability streams from different classifiers (GMM, MLP, Sparse Coding) can improve the noise robustness of ASR. Maximizing word accuracy required the stream weights to be systematically dependent on the specific input streams and SNR. The tuning of the weights, however, was largely a matter of trial and error and typically involved a laborious grid search. In this paper, we propose two fundamental, analytical methods to better understand these empirical findings. To that end, we maximize the trustworthiness of merged streams as function of the stream weights. Trustworthiness is defined as the probability that the winning state in a probability vector correctly predicts a golden reference state obtained by a forced alignment. Even though our approach is not directly equivalent to optimizing word accuracy, both methods appear highly useful to obtain insight in stream properties that determine the success of a given merge (or the lack thereof). Furthermore, both methods clearly support the trends that exist in the grid-search based empirical observations.

DOI: 10.21437/Interspeech.2016-1050

Cite as

Bosch, L.t., Cranen, B., Sun, Y. (2016) Analytical Assessment of Dual-Stream Merging for Noise-Robust ASR. Proc. Interspeech 2016, 3793-3797.

author={Louis ten Bosch and Bert Cranen and Yang Sun},
title={Analytical Assessment of Dual-Stream Merging for Noise-Robust ASR},
booktitle={Interspeech 2016},