In this work, we validate the effectiveness of our recently proposed integrated template matching and statistical modeling approach on four baseline systems with increasing phone recognition accuracies in the range of 73% to 78% for the TIMIT task. The four baselines were generated using the methods of 1) Discriminative Training (DT) of Minimum Phone Error (MPE), 2) MFCC concatenated with ensemble Multiple Layer Perceptron (MFCC+EMLP) features, 3) DT combined with the MFCC+EMLP features, and 4) data sampling based ensemble acoustic models integrated with DT and MFCC+EMLP features. Experimental results obtained from template matching based rescoring on the phone lattices generated by the baseline models have shown that our template matching approach has produced consistent and significant improvements over the four baselines, and the highest recognition accuracy was 79.55% obtained from rescoring the phone lattices produced by the ensemble acoustic model baseline.
Bibliographic reference. Sun, Xie / Chen, Xin / Zhao, Yunxin (2011): "On the effectiveness of statistical modeling based template matching approach for continuous speech recognition", In INTERSPEECH-2011, 453-456.