Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech Recognition

Jihwan Kim, Jisung Wang, Sangki Kim, Yeha Lee


Neural architecture search (NAS) has been successfully applied to finding efficient, high-performance deep neural network architectures in a task-adaptive manner without extensive human intervention. This is achieved by choosing genetic, reinforcement learning, or gradient -based algorithms as automative alternatives of manual architecture design. However, a naive application of existing NAS algorithms to different tasks may result in architectures which perform sub-par to those manually designed. In this work, we show that NAS can provide efficient architectures that outperform manually designed attention-based architectures on speech recognition tasks, after which we named Evolved Speech-Transformer (EST). With a combination of carefully designed search space and Progressive dynamic hurdles, a genetic algorithm based, our algorithm finds a memory-efficient architecture which outperforms vanilla Transformer with reduced training time.


 DOI: 10.21437/Interspeech.2020-1233

Cite as: Kim, J., Wang, J., Kim, S., Lee, Y. (2020) Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech Recognition. Proc. Interspeech 2020, 1788-1792, DOI: 10.21437/Interspeech.2020-1233.


@inproceedings{Kim2020,
  author={Jihwan Kim and Jisung Wang and Sangki Kim and Yeha Lee},
  title={{Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1788--1792},
  doi={10.21437/Interspeech.2020-1233},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1233}
}