Speech Prosody 2008

Campinas, Brazil
May 6-9, 2008

Korean MULTEXT: A Korean Prosody Corpus

Sunhee Kim (1), Daniel Hirst (2), Hyongsil Cho (3), Ho-Young Lee (3), Minhwa Chung (3)

(1) Center for Humanities and Information, Seoul National University, Korea
(2) Laboratoire Parole et Langage, CNRS/Aix-Marseille Université, France
(3) Department of Linguistics, Seoul National University, Korea

This paper describes the contents of the Korean prosody corpus (Korean MULTEXT), which is a Korean version of the speech database Eurom1. The corpus consists of about 2 hours of read speech, transcribed primarily in orthography (in Korean alphabet and in a Romanized transcription), in IPA and in SAMPA. Furthermore, it includes the original F0 values, stylized F0 values extracted using Momel, and hand-corrected F0 values. The prosodic events are annotated in two ways. They are annotated with the automatic annotation algorithm, INTSINT, and also labeled manually into prosodic units with two tones on the hand-corrected pitch targets. It is found that the resulting tone patterns from the proposed Momel-based two tone labeling correspond to those defined in K-ToBI.

