EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Concordancing for Parallel Spoken Language Corpora

Dafydd Gibbon (1), Thorsten Trippel (1), Serge Sharoff (2)

(1) Universitšt Bielefeld, Germany
(2) Russian Research Institut for Artificial Intelligence / Universitšt Bielefeld, Germany

Concordancing is one of the oldest corpus analysis tools, especially for written corpora. In NLP concordancing appears in training of speech-recognition system. Additionally, comparative studies of different languages result in parallel corpora. Concordancing for these corpora in a NLP context is a new approach. We propose to combine these fields of interest for a multi-purpose concordance for Spoken Language Data, opening the opportunity of combining corpus-linguistic and NLP methods resulting in a broader empirical basis for NLP research. Theoretic models for audio-concordances are discussed. Principles of the structure and design of a parallel audio concordance are given, coding by means of XML to ensure reusability and flexibility, using time stamps for referencing from annotations to the signal.

Full Paper

Bibliographic reference.  Gibbon, Dafydd / Trippel, Thorsten / Sharoff, Serge (2001): "Concordancing for parallel spoken language corpora", In EUROSPEECH-2001, 2063-2066.