In this work we describe ongoing development of the first automatic speech recognition (ASR) system for the American indigenous language, Choctaw (ISO 639-2: cho, endonym: Chahta). Choctaw is spoken by the Choctaw people, with an estimated 10,000 fluent speakers across three federally recognized Choctaw tribes. The Choctaw language is subject-object-verb order, and is highly inflectional, with prefixes, suffixes, and infixes possible on a single verb base. The language also has rhythmic lengthening, in which certain vowels are lengthened based on vowels in affixes. The motivation for developing an ASR system include: assisting in efforts to revitalize and reclaim the endangered language by aiding language learners; promoting additional contexts and scenarios for increased language use, such as conversations with automated dialogue systems; and supporting language documentation. We describe our collection of two-party conversational data and repetition of prepared phrases from a diverse set of speakers that was used to train the system. The ASR model was implemented using Kaldi. The model is currently trained and tested on a subset of the collected data, and achieves a WER of 49.35%.
Cite as: Brixey, J., Traum, D. (2022) Towards an Automatic Speech Recognizer for the Choctaw language. Proc. 1st Workshop on Speech for Social Good (S4SG), 6-9, doi: 10.21437/S4SG.2022-2
@inproceedings{brixey22_s4sg, author={Jacqueline Brixey and David Traum}, title={{Towards an Automatic Speech Recognizer for the Choctaw language}}, year=2022, booktitle={Proc. 1st Workshop on Speech for Social Good (S4SG)}, pages={6--9}, doi={10.21437/S4SG.2022-2} }