Information Encoding by Deep Neural Networks: What Can We Learn?

Louis ten Bosch, Lou Boves

The recent advent of deep learning techniques in speech technology and in particular in automatic speech recognition has yielded substantial performance improvements. This suggests that deep neural networks (DNNs) are able to capture structure in speech data that older methods for acoustic modeling, such as Gaussian Mixture Models and shallow neural networks fail to uncover. In image recognition it is possible to link representations on the first couple of layers in DNNs to structural properties of images and to representations on early layers in the visual cortex. This raises the question whether it is possible to accomplish a similar feat with representations on DNN layers when processing speech input. In this paper we present three different experiments in which we attempt to untangle how DNNs encode speech signals and to relate these representations to phonetic knowledge, with the aim to advance conventional phonetic concepts and to choose the topology of a DNNs more efficiently. Two experiments investigate representations formed by auto-encoders. A third experiment investigates representations on convolutional layers that treat speech spectrograms as if they were images. The results lay the basis for future experiments with recursive networks.

 DOI: 10.21437/Interspeech.2018-1896

Cite as: ten Bosch, L., Boves, L. (2018) Information Encoding by Deep Neural Networks: What Can We Learn?. Proc. Interspeech 2018, 1457-1461, DOI: 10.21437/Interspeech.2018-1896.

@inproceedings{ten Bosch2018,
  author={Louis {ten Bosch} and Lou Boves},
  title={Information Encoding by Deep Neural Networks: What Can We Learn?},
  booktitle={Proc. Interspeech 2018},