Language Teaching, Learning and Technology (LTLT 2015)

Leipzig, Germany
September 4, 2015

Preparing Children's Writing Database for Automated Processing

Rémi Lavalley (1), Kay Berkling (1), Sebastian Stüker (2)

(1) Cooperative State University; (2) Karlsruhe Institute of Technology (KIT)
Karlsruhe, Germany

This paper describes the process of anonymizing a German, publicly available children’s corpus of digitized and scanned in spontaneously written texts from Grades 1-8. After reviewing the data collection process published previously, the method for anonymization of texts and meta data are described. A revised annotation set that was added to the existing transcription is defined. This annotation supports the spelling error analysis process while adding further annotation at the syntax level to allow for separate processing of these issues. Updates to statistics for the new version of the data are reported to give the reader an idea about research potential this version of the data may provide.

Index Terms: Orthography, Corpora, Children’s Texts, Digitization, Anonymization

Full Paper

Bibliographic reference.  Lavalley, Rémi / Berkling, Kay / Stüker, Sebastian (2015): "Preparing children's writing database for automated processing", In LTLT-2015, 9-15.