A Mostly Data-Driven Approach to Inverse Text Normalization

Ernest Pusateri, Bharat Ram Ambati, Elizabeth Brooks, Ondrej Platek, Donald McAllaster, Venki Nagesha


For an automatic speech recognition system to produce sensibly formatted, readable output, the spoken-form token sequence produced by the core speech recognizer must be converted to a written-form string. This process is known as inverse text normalization (ITN). Here we present a mostly data-driven ITN system that leverages a set of simple rules and a few hand-crafted grammars to cast ITN as a labeling problem. To this labeling problem, we apply a compact bi-directional LSTM. We show that the approach performs well using practical amounts of training data.


 DOI: 10.21437/Interspeech.2017-1274

Cite as: Pusateri, E., Ambati, B.R., Brooks, E., Platek, O., McAllaster, D., Nagesha, V. (2017) A Mostly Data-Driven Approach to Inverse Text Normalization. Proc. Interspeech 2017, 2784-2788, DOI: 10.21437/Interspeech.2017-1274.


@inproceedings{Pusateri2017,
  author={Ernest Pusateri and Bharat Ram Ambati and Elizabeth Brooks and Ondrej Platek and Donald McAllaster and Venki Nagesha},
  title={A Mostly Data-Driven Approach to Inverse Text Normalization},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2784--2788},
  doi={10.21437/Interspeech.2017-1274},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1274}
}