EUROSPEECH 2001 Scandinavia
A method for learning lexical representations of unknown words in an unsupervised manner is described. The unknown words are automatically extracted from continuous speech and a clustering algorithm is used to derive word clusters and lexical representations based on the set of phonetic units used in the system. In experiments, we verify the robustness of the approach. An interesting feature is that extraction errors usually do no harm, as wrongly extracted words tend to inhabit clusters by themselves and thus do not adversely effect the modeling of correctly extracted words.
Bibliographic reference. Lucke, Helmut / Omote, Masanori (2001): "Automatic word acquisition from continuous speech", In EUROSPEECH-2001, 2667-2670.