Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

CEUDEX: A Data Base Oriented to Context-Dependent Units Training in Spanish for Continuous Speech Recognition

Celinda de la Torre-Munilla, Luis Hernandez-Gomez, Daniel Tapias

Speech Technology Group, Telefonica Investigation y Desarrollo, Madrid, Spain

In this paper we describe the design and recording process of the new telephone speech database recorded in Telefonica Investigation y Desarrollo, designed for research in large vocabulary speaker independent continuous speech recognition, speaker adaptation and speaker verification in Spanish over the telephone line. The database is composed of two sets: (a) CEUDEX, the main set, with a corpus of 400 phonetically balanced sentences, and (b) SPATIS: a task oriented set which was inspired in the ATIS (Air Travel Information System) [9] standard application for English. It will be used for Task-Independent tests of the Continuous Speech Recognizer. In the first stage of the recording procedure, a total of 21500 sentences from nearly 300 speakers were collected.

Full Paper

Bibliographic reference.  Torre-Munilla, Celinda de la / Hernandez-Gomez, Luis / Tapias, Daniel (1995): "CEUDEX: a data base oriented to context-dependent units training in Spanish for continuous speech recognition", In EUROSPEECH-1995, 845-848.