13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Parallel Training for Deep Stacking Networks

Li Deng (1), Brian Hutchinson (2), Dong Yu (1)

(1) Microsoft Research, Redmond, WA, USA
(2) University of Washington, Seattle, WA, USA

The Deep stacking network (DSN) is a special type of deep architecture developed to enable parallel learning of its weight parameters distributed over large CPU clusters. This capability of DSN in learning parallelism is unique among all deep models explored so far. As a prospective key component of next-generation speech recognizers, the architectural design of the DSN and its parallel learning enable DSNfs scalability over a potentially unlimited amount of training data and over CPU clusters. In this paper, we present our first parallel implementation of the DSN learning algorithm. Particularly, we show the tradeoff between the time/memory saving via a high degree of parallelism and the associated cost arising from inter-CPU communication. In addition, in phone classification experiments, we demonstrate a significantly lowered error rate achieved by DSN with full-batch training, which is enabled by parallel implementation in a CPU cluster, than with the corresponding mini-batch training exploited prior to the work reported in this paper.

Index Terms: parallel and distributed computing, deep stacking networks, full-batch training, phone classification

Full Paper

Bibliographic reference.  Deng, Li / Hutchinson, Brian / Yu, Dong (2012): "Parallel training for deep stacking networks", In INTERSPEECH-2012, 2598-2601.