Balancing Word Lists in Speech Audiometry Through Large Spoken Language Corpora

Annemiek Hammer (1), Bart Vaerenberg (2), Wojtek Kowalczyk (3), Louis ten Bosch (4), Martine Coene (1), Paul J. Govaerts (2)

(1) Vrije Universiteit Amsterdam, The Netherlands
(2) Eargroup, Belgium
(3) Universiteit Leiden, The Netherlands
(4) Radboud Universiteit Nijmegen, The Netherlands

This paper describes a distance measure which estimates the distance between a language sample and a reference corpus with regard to graphemes, phonemes and the relation between them. The underlying assumption of this approach is that a languagefs phoneme distribution can be partially accessed via graphemes. The advantage of using such a measure in speech audiometry is twofold: (i) it may be applied to determine how representative existing word lists are with respect to the distribution of speech sounds in the target language of the test subject; (ii) it enables the audiologist to generate highly representative lists based on large corpora of languages for which broad phonetic transcription is lacking. In this paper the development of the de novo distance measure is described and demonstrated for Dutch. The technique itself however, is language-independent and has been applied successfully to 10 other EU-languages. As such, it paves the way to generating representative word lists as part of speech audiometric test batteries for any given language.

