Active Memory Networks for Language Modeling

Oscar Chen, Anton Ragni, Mark Gales, Xie Chen

Making predictions of the following word given the back history of words may be challenging without meta-information such as the topic. Standard neural network language models have an implicit representation of the topic via the back history of words. In this work a more explicit form of topic representation is used via an attention mechanism. Though this makes use of the same information as the standard model, it allows parameters of the network to focus on different aspects of the task. The attention model provides a form of topic representation that is automatically learned from the data. Whereas the recurrent model deals with the (conditional) history representation. The combined model is expected to reduce the stress on the standard model to handle multiple aspects. Experiments were conducted on the Penn Tree Bank and BBC Multi-Genre Broadcast News (MGB) corpora, where the proposed approach outperforms standard forms of recurrent models in perplexity. Finally, N-best list rescoring for speech recognition in the MGB3 task shows word error rate improvements over comparable standard form of recurrent models.

 DOI: 10.21437/Interspeech.2018-78

Cite as: Chen, O., Ragni, A., Gales, M., Chen, X. (2018) Active Memory Networks for Language Modeling. Proc. Interspeech 2018, 3338-3342, DOI: 10.21437/Interspeech.2018-78.

  author={Oscar Chen and Anton Ragni and Mark Gales and Xie Chen},
  title={Active Memory Networks for Language Modeling},
  booktitle={Proc. Interspeech 2018},