Implementation of a Radiology Speech Recognition System for Estonian Using Open Source Software

Tanel Alumäe, Andrus Paats, Ivo Fridolin, Einar Meister


Speech recognition has become increasingly popular in radiology reporting in the last decade. However, developing a speech recognition system for a new language in a highly specific domain requires a lot of resources, expert knowledge and skills. Therefore, commercial vendors do not offer ready-made radiology speech recognition systems for less-resourced languages.

This paper describes the implementation of a radiology speech recognition system for Estonian, a language with less than one million native speakers. The system was developed in partnership with a hospital that provided a corpus of written reports for language modeling purposes. Rewrite rules for pre-processing training texts and postprocessing recognition results were created manually based on a small parallel corpus created by the hospital’s radiologists, using the Thrax toolkit. Deep neural network based acoustic models were trained based on 216 hours of out-of-domain data and adapted on 14 hours of spoken radiology data, using the Kaldi toolkit. The current word error rate of the system is 5.4%. The system is in active use in real clinical environment.


 DOI: 10.21437/Interspeech.2017-928

Cite as: Alumäe, T., Paats, A., Fridolin, I., Meister, E. (2017) Implementation of a Radiology Speech Recognition System for Estonian Using Open Source Software. Proc. Interspeech 2017, 2168-2172, DOI: 10.21437/Interspeech.2017-928.


@inproceedings{Alumäe2017,
  author={Tanel Alumäe and Andrus Paats and Ivo Fridolin and Einar Meister},
  title={Implementation of a Radiology Speech Recognition System for Estonian Using Open Source Software},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2168--2172},
  doi={10.21437/Interspeech.2017-928},
  url={http://dx.doi.org/10.21437/Interspeech.2017-928}
}