Finnish ASR with Deep Transformer Models

Abhilash Jain, Aku Rouhe, Stig-Arne Grönroos, Mikko Kurimo


Recently, BERT and Transformer-XL based architectures have achieved strong results in a range of NLP applications. In this paper, we explore Transformer architectures — BERT and Transformer-XL — as a language model for a Finnish ASR task with different rescoring schemes.

We achieve strong results in both an intrinsic and an extrinsic task with Transformer-XL. Achieving 29% better perplexity and 3% better WER than our previous best LSTM-based approach. We also introduce a novel three-pass decoding scheme which improves the ASR performance by 8%. To the best of our knowledge, this is also the first work (i) to formulate an alpha smoothing framework to use the non-autoregressive BERT language model for an ASR task, and (ii) to explore sub-word units with Transformer-XL for an agglutinative language like Finnish.


 DOI: 10.21437/Interspeech.2020-1784

Cite as: Jain, A., Rouhe, A., Grönroos, S., Kurimo, M. (2020) Finnish ASR with Deep Transformer Models. Proc. Interspeech 2020, 3630-3634, DOI: 10.21437/Interspeech.2020-1784.


@inproceedings{Jain2020,
  author={Abhilash Jain and Aku Rouhe and Stig-Arne Grönroos and Mikko Kurimo},
  title={{Finnish ASR with Deep Transformer Models}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3630--3634},
  doi={10.21437/Interspeech.2020-1784},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1784}
}