Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

A Comparison of Inter-Transcriber Reliability for Two Systems of Prosodic Annotation: RaP (Rhythm and Pitch) and ToBI (Tones and Break Indices)

Laura Dilley (1), Mara Breen (2), Marti Bolivar (2), John Kraemer (2), Edward Gibson (2)

(1) Ohio State University, USA; (2) Massachusetts Institute of Technology, USA

Agreement was investigated among five labelers for the use of two prosodic annotation systems: the ToBI (Tones and Break Indices) system [1,2] and the RaP (Rhythm and Pitch) system [3]. Each system permits the labeling of pitch accents and two levels of phrasal boundaries; RaP also permits labeling of speech rhythm and distinguishes multiple levels of prominence on syllables. After training with computerized materials and getting expert feedback, coders applied each system to a corpus of read and spontaneous speech (36 minutes for ToBI and 19 for RaP). Inter-coder reliability was computed according to two metrics: transcriber-syllable-pairs and the kappa statistic. High agreement was obtained for both systems for pitch accent presence, pitch accent type, boundary presence, boundary type, and, for RaP, presence and strength of metrical prominences. Agreement levels for ToBI were similar to those of previous studies [4,5], indicating that participants were proficient coders. Moreover, the high level of agreement demonstrated for the RaP system indicates that RaP is a viable alternative to ToBI for prosodic labeling of large speech corpora.

