5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

A Multilingual Prosodic Database

Estelle Campione, Jean Véronis

Université de Provence, France

We present a prosodic corpus in five languages (French, English, Italian, German and Spanish) comprising 4 hours and 20 minutes of speech and involving 50 different speakers (5 male and 5 female per language). The recordings on which the corpus is based are extracted from the EUROM 1 speech database and consists of passages of about five sentences. The corpus was stylized automatically by an algorithm which factors out microprosodic effects and represents the intonation contour of utterances by a series of target points. Once interpolated by a smooth curve (spline), these points produce a contour undistinguishable from the original when re-synthesized, apart from a few detection errors. A symbolic coding of the 50000 pitch movements of the corpus is also provided, along with the time-alignment of orthographic transcription to signal at word-level. The entire corpus was verified and manually corrected by experts for each language. It will be made available at production cost for research through the European Language Resource Association (ELRA).

Full Paper

Bibliographic reference.  Campione, Estelle / Véronis, Jean (1998): "A multilingual prosodic database", In ICSLP-1998, paper 0844.