Gating Recurrent Enhanced Memory Neural Networks on Language Identification

Wang Geng, Yuanyuan Zhao, Wenfu Wang, Xinyuan Cai, Bo Xu

This paper proposes a novel memory neural network structure, namely gating recurrent enhanced memory network (GREMN), to model long-range dependency in temporal series on language identification (LID) task at the acoustic frame level. The proposed GREMN is a stacking gating recurrent neural network (RNN) equipped with a learnable enhanced memory block near the classifier. It aims at capturing the long-span history and certain future contextual information of the sequential input. In addition, two optimization strategies of coherent SortaGrad-like training mechanism and a hard sample score acquisition approach are proposed. The proposed optimization policies drastically boost this memory network based LID system, especially on the large disparity training materials. It is confirmed by the experimental results that the proposed GREMN possesses strong ability of sequential modeling and generalization, where about 5% relative equal error rate (EER) reduction is obtained comparing with the approximate-sized gating RNNs and 38.5% performance improvements is observed compared to conventional i-Vector based LID system.

DOI: 10.21437/Interspeech.2016-684

Cite as

Geng, W., Zhao, Y., Wang, W., Cai, X., Xu, B. (2016) Gating Recurrent Enhanced Memory Neural Networks on Language Identification. Proc. Interspeech 2016, 3280-3284.

author={Wang Geng and Yuanyuan Zhao and Wenfu Wang and Xinyuan Cai and Bo Xu},
title={Gating Recurrent Enhanced Memory Neural Networks on Language Identification},
booktitle={Interspeech 2016},