Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings

Masayuki Suzuki, Ryuki Tachibana, Samuel Thomas, Bhuvana Ramabhadran, George Saon

Adaptation of Automatic Speech Recognition (ASR) systems to a new domain (channel, speaker, topic, etc.) remains a significant challenge, as often, only a limited amount of target domain data for adaptation of Acoustic Models (AMs) is available. However, unlike GMMs, to date, there has not been an established, efficient method for adapting current state-of-the-art Convolutional Neural Network (CNN)-based AMs. In this paper, we explore various training algorithms for domain adaptation of CNN based speech recognition systems with limited acoustic training data resources. Our investigations illustrate the following three main contributions. First, introducing a weight decay based regularizer along with the standard cross entropy criteria can significantly improve recognition performances with as little as one hour of adaptation data. Second, the observed gains can be improved further with the state-level Minimum Bayes Risk (sMBR) based sequence training technique. In addition to supervised training with limited amounts of data, we also study the effect of introducing unsupervised data at both the initial cross-entropy and subsequent sequence training stages. Our experiments show that unsupervised data helps with cross-entropy and sequence training criteria. Third, the effect of speaker diversity in the adaptation data is also investigated where our experiments show that although there can be large variance in final performance depending on the speakers selected, regularization is required to obtain significant gains. Overall, we demonstrate that with adaptation of neural network based acoustic models, we can obtain performance improvements of up to 24.8% relative.

DOI: 10.21437/Interspeech.2016-1161

Cite as

Suzuki, M., Tachibana, R., Thomas, S., Ramabhadran, B., Saon, G. (2016) Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings. Proc. Interspeech 2016, 1588-1592.

author={Masayuki Suzuki and Ryuki Tachibana and Samuel Thomas and Bhuvana Ramabhadran and George Saon},
title={Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings},
booktitle={Interspeech 2016},