Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

The Design and Efficient Recording of a 3000 Speaker Scandinavian Telephone Speech Database: Rafael.0

Per Rosenbeck, Bo Baungaard, Claus Jacobsen, Dan-Joe Barry

R&D Department, Jydsk Telefon, Aarhus-Tranbjerg, Denmark

This paper presents experience from a new efficient concept for establishing a large speech database for Spoken Language research and new Advanced Telephone Services (ATS). This Scandinavian telephone speech database, entitled Rafael.0, contains speech material from 3000 speakers in total, from Denmark, Norway and Sweden. The speakers for Rafael.0 have been carefully selected according to a predetermined distribution based on age, sex and regional language. A number of regional languages, comprising dialects and the standard language, have been defined using linguistic knowledge. The speakers in Rafael.0 have spoken connected digits and whole sentences with emphasis on obtaining realistic and spontaneous telephone speech. A dedicated PC-based recording system with an ISDN interface board connected to a digital exchange, has been developed for this project. The recording system proved to be very efficient and flexible for the recording of the speakers via domestic telephone lines. Labelling of Rafael.0 is done at sentence level and for a specific part also at word level.

Full Paper

Bibliographic reference.  Rosenbeck, Per / Baungaard, Bo / Jacobsen, Claus / Barry, Dan-Joe (1994): "The design and efficient recording of a 3000 speaker scandinavian telephone speech database: rafael.0", In ICSLP-1994, 1807-1810.