EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Automatic Word Acquisition from Continuous Speech

Helmut Lucke, Masanori Omote

Sony Corporation, Japan

A method for learning lexical representations of unknown words in an unsupervised manner is described. The unknown words are automatically extracted from continuous speech and a clustering algorithm is used to derive word clusters and lexical representations based on the set of phonetic units used in the system. In experiments, we verify the robustness of the approach. An interesting feature is that extraction errors usually do no harm, as wrongly extracted words tend to inhabit clusters by themselves and thus do not adversely effect the modeling of correctly extracted words.

Full Paper

Bibliographic reference.  Lucke, Helmut / Omote, Masanori (2001): "Automatic word acquisition from continuous speech", In EUROSPEECH-2001, 2667-2670.