EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology
2nd INTERSPEECH Event

Aalborg, Denmark
September 3-7, 2001

                 

Large Broadcast News and Read Speech Corpora of Spoken Czech

Josef Psutka (1), Vlasta Radova (1), Ludek Müller (1), Jindrich Matousek (1), Pavel Ircing (1), David Graff (2)

(1) University of West Bohemia in Pilsen, Czech Republic
(2) University of Pennsylvania, USA

This paper presents the first annotated and phonetically transcribed large speech corpora developed for spoken Czech. All corpora were collected during the last two years at the Department of Cybernetics, University of West Bohemia (UWB) in Pilsen. The first two collections are broadcast news, the third corpus is a high-quality read-speech database. This paper describes the collection conditions, annotation and phonetic transcription process related to each corpus. The basic phonetic and lexical characteristics of all corpora will be given and compared mutually.

Full Paper

Bibliographic reference.  Psutka, Josef / Radova, Vlasta / Müller, Ludek / Matousek, Jindrich / Ircing, Pavel / Graff, David (2001): "Large broadcast news and read speech corpora of spoken czech", In EUROSPEECH-2001, 2067-2070.