EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


African Speech Technology (AST) Telephone Speech Databases: Corpus Design and Contents

Philippa H. Louw (1), Justus C. Roux (1), Elizabeth C. Botha (2)

(1) University of Stellenbosch, South Africa
(2) University of Pretoria, South Africa

The African Speech Technology project is developing telephone speech databases for five of South Africaís eleven official languages, i.e. South African English, Afrikaans, Zulu, Xhosa, and Southern Sotho. These databases will be fully transcribed - orthographically and phonetically - and will be used for the training and testing of phoneme-based, speaker-independent speech recognition systems. The project aims to deliver a telephone speech application developerís software toolkit. A prototype multilingual enquiry and booking system for the hotel industry will be developed as a first application. This paper describes the design and contents of the speech corpus that is currently being collected over both mobile and fixed networks. In particular language coverage is discussed within the framework of the multilingual character of the South African population. Some language specific differences with regards to the contents of the different databases are noted. Methods and tools applied in the acquisition of phonetic information are discussed.

Full Paper

Bibliographic reference.  Louw, Philippa H. / Roux, Justus C. / Botha, Elizabeth C. (2001): "African speech technology (AST) telephone speech databases: corpus design and contents", In EUROSPEECH-2001, 2055-2058.