Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

Empirical Evaluation of Human Performance and Agreement in Parsing Discourse Constituents in Spoken Dialogue

Giovanni Flammia, Victor Zue

Spoken Language Systems Group, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA

This paper presents an empirical study on the annotation of discourse units in spoken dialogues. The goal of this research is to examine whether task-oriented human-human dialogues can be structured as sequences of a small number of individual discourse segments that can be reliably end-pointed. The data used for this study is a corpus of 18 orthographic, transcriptions of actual telephone conversations between customers and travel agents or Yellow Pages operators. We propose the use of a general agreement metric derived from the kappa coefficient and we apply it to measuring the level of agreement among human coders in bracketing discourse segments. Despite the apparent difficulty of this annotation task, we show that a level of agreement around 60% can be reached among at least three out of five coders with variable levels of expertise, using a minimal and theory-neutral set of annotation instructions.

Full Paper

Bibliographic reference.  Flammia, Giovanni / Zue, Victor (1995): "Empirical evaluation of human performance and agreement in parsing discourse constituents in spoken dialogue", In EUROSPEECH-1995, 1965-1968.