EUROSPEECH 2001 Scandinavia
This paper describes preliminary recognition experiments on PhoneBook, a corpus of isolated, telephone-bandwidth, read words from a large (almost 8,000-word) vocabulary. We have chosen this corpus as a testbed for experiments on the language model-independent parts of a segment-based recognizer. We present results showing that a segment-based recognizer performs well on this task, and that a simple Gaussian mixture phone duration model significantly reduces the error rate. We compare context-independent, stress-dependent, and word position-dependent duration models and obtain relative error rate reductions of up to 12% on the test set. Finally, we make some observations regarding the effects of stress and word position in this isolated-word task and discuss our plans for further research using PhoneBook.
Bibliographic reference. Livescu, Karen / Glass, James (2001): "Segment-based recognition on the phonebook task: initial results and observations on duration modeling", In EUROSPEECH-2001, 1437-1440.