EUROSPEECH '95
Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

Design of a Phonetic Corpus for a Speech Database in Basque Language

K. Lopez de Ipina (1), I. Torres (2), L. Onederra (3)

(1) Dpto. Automatica, Electronica e Ingenieria de Sistemas, Universidad Publica de Navarra/Nafarroako Unibertsitate Publikoa. Spain
(2) Dpto. Electricidad y Electronica, Universidad del PaisVasco/Euskal Herriko Unibertsitatea. Spain
(3) Dpto. Filologia Vasca, Universidad del PaisVasco/Euskal Herriko Unibertsitatea. Spain

The design of Continuous Speech Recognition System requires to select a large amount of spoken data for each specific language. The goal of this work was the design of a Phonetic Corpus for a Speech Database in Basque language. Several samples of nowadays narrative, spoken language and newspaper language were previously analysed under a phonetic point of view. The Speech Database finally designed consisted of a Phonetic Corpus including 300 sentences phonetically balanced uttered twice by 40 speakers resulting in about 900.000 allophones. Two additional corpora of digits and short words completed the database. This database includes the adequate distribution of allophones and contexts to model Basque phones in both, Speech Recognition Systems and Linguistic analysis frameworks. Keywords: Speech Databases, Basque language.

Full Paper

Bibliographic reference.  Lopez de Ipina, K. / Torres, I. / Onederra, L. (1995): "Design of a phonetic corpus for a speech database in basque language", In EUROSPEECH-1995, 851-854.