CAT: A CTC-CRF Based ASR Toolkit Bridging the Hybrid and the End-to-End Approaches Towards Data Efficiency and Low Latency

Keyu An, Hongyu Xiang, Zhijian Ou


In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit). CAT inherits the data-efficiency of the hybrid approach and the simplicity of the E2E approach, providing a full-fledged implementation of CTC-CRFs and complete training and testing scripts for a number of English and Chinese benchmarks. Experiments show CAT obtains state-of-the-art results, which are comparable to the fine-tuned hybrid models in Kaldi but with a much simpler training pipeline. Compared to existing non-modularized E2E models, CAT performs better on limited-scale datasets, demonstrating its data efficiency. Furthermore, we propose a new method called contextualized soft forgetting, which enables CAT to do streaming ASR without accuracy degradation. We hope CAT, especially the CTC-CRF based framework and software, will be of broad interest to the community, and can be further explored and improved.


 DOI: 10.21437/Interspeech.2020-2732

Cite as: An, K., Xiang, H., Ou, Z. (2020) CAT: A CTC-CRF Based ASR Toolkit Bridging the Hybrid and the End-to-End Approaches Towards Data Efficiency and Low Latency. Proc. Interspeech 2020, 566-570, DOI: 10.21437/Interspeech.2020-2732.


@inproceedings{An2020,
  author={Keyu An and Hongyu Xiang and Zhijian Ou},
  title={{CAT: A CTC-CRF Based ASR Toolkit Bridging the Hybrid and the End-to-End Approaches Towards Data Efficiency and Low Latency}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={566--570},
  doi={10.21437/Interspeech.2020-2732},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2732}
}