Multilingual Deep Neural Network Training Using Cyclical Learning Rate

Andreas Søeborg Kirkedal, Yeon-Jun Kim

Deep Neural Network (DNN) acoustic models are an essential component in automatic speech recognition (ASR). The main sources of accuracy improvements in ASR involve training DNN models that require large amounts of supervised data and computational resources. While the availability of sufficient monolingual data is a challenge for low-resource languages, the computational requirements for resource rich languages increases significantly with the availability of large data sets. In this work, we provide novel solutions for these two challenges in the context of training a feed-forward DNN acoustic model (AM) for mobile voice search. To address the data-sparsity challenge, we bootstrap our multilingual AM using data from languages in the same language family. To reduce training time, we use cyclical learning rate (CLR) which has demonstrated fast convergence with competitive or better performance when training neural networks on tasks related to text and images. We reduce training time for our Mandarin Chinese AM with 81.4% token accuracy from 40 to 21.3 hours and increase the word accuracy on three romance languages by 2-5% with multilingual AMs compared to monolingual DNN baselines.

 DOI: 10.21437/Interspeech.2018-1891

Cite as: Kirkedal, A.S., Kim, Y. (2018) Multilingual Deep Neural Network Training Using Cyclical Learning Rate. Proc. Interspeech 2018, 2933-2937, DOI: 10.21437/Interspeech.2018-1891.

  author={Andreas Søeborg Kirkedal and Yeon-Jun Kim},
  title={Multilingual Deep Neural Network Training Using Cyclical Learning Rate},
  booktitle={Proc. Interspeech 2018},