Enhancing Data-Driven Phone Confusions Using Restricted Recognition

Mark Kane, Julie Carson-Berndsen

This paper presents a novel approach to address data sparseness in standard confusion matrices and demonstrates how enhanced matrices, which capture additional similarities, can impact the performance of spoken term detection. Using the same training data as for the standard phone confusion matrix, an enhanced confusion matrix is created by iteratively restricting the recognition process to exclude one acoustic model per iteration. Since this results in a greater amount of confusion data for each phone, the enhanced confusion matrix encodes more similarities. The enhanced phone confusion matrices perform demonstrably better than standard confusion matrices on a spoken term detection task which uses both HMMs and DNNs.

DOI: 10.21437/Interspeech.2016-489

Cite as

Kane, M., Carson-Berndsen, J. (2016) Enhancing Data-Driven Phone Confusions Using Restricted Recognition. Proc. Interspeech 2016, 3693-3697.

author={Mark Kane and Julie Carson-Berndsen},
title={Enhancing Data-Driven Phone Confusions Using Restricted Recognition},
booktitle={Interspeech 2016},