4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Context Modeling and Clustering in Continuous Speech Recognition

Jean-Claude Junqua, Lorenzo Vassallo

(1) Speech Technology Laboratory, Panasonic Technologies Inc., Santa Barbara, CA, USA
(2) Eurécom, Sophia Antipolis, France

In this paper, we report on the performance of two variants of wellknown statistical-based clustering techniques and present an evaluation on the TIMIT and TI-Digit databases. A clustering approach which 1) is based on a divergence criterion, 2) separates "good" and "bad" models using a class-dependent adjustable threshold on the number of examples per model, and 3) guides the clustering by limiting the number of models per class between two constants Nmin and Nmax, gave the best results. On the TI-Digit database, the combination of triphone modeling and divergence-based clustering yielded greater accuracy than that obtained with word models for a similar system complexity.

Full Paper

Bibliographic reference.  Junqua, Jean-Claude / Vassallo, Lorenzo (1996): "Context modeling and clustering in continuous speech recognition", In ICSLP-1996, 2262-2265.