EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Knowledge of Language Origin Improves Pronunciation Accuracy of Proper Names

Ariadna Font Llitjos, Alan W. Black

Carnegie Mellon Univ., USA

As it is impossible to have a lexicon with complete coverage, and a high proportion of unknown words are proper names, this paper addresses the issue of automatically finding pronunciations of unseen proper names in US English. Proper names, especially in the US, may come from a large range of ethnic backgrounds. We present a model and results showing that including ethnic origin of words in a statistical model can improve pronunciation results. We used a lexicon of 56,000 proper names from CMUDICT, and gathered data (text and proper names) from 26 languages to built statistical models that provide an estimate of word origin. Tests against held out data showed a 7.6% absolute improvement from a baseline of 54.8% when language based features were added to our CART-based model. Our user studies show a 17% preference for the model with language features compared to the baseline.

Full Paper

Bibliographic reference.  Llitjos, Ariadna Font / Black, Alan W. (2001): "Knowledge of language origin improves pronunciation accuracy of proper names", In EUROSPEECH-2001, 1919-1922.