5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Creating Large Subword Units for Speech Recognition

Thilo Pfau (1), Manfred Beham (2), W. Reichl (3), GŁnther Ruske (1)

(1) Institute for Human-Machine-Communication, Technical University of Munich, Germany
(2) pc-plus-COMPUTING, Munich, Germany
(3) Dialogue Systems Research Department, Bell Laboratories, Lucent Technologies, Murray Hill, NJ, USA

This paper deals with the choice of suitable subword units (SWU) for a HMM based speech recognition system. Using demisyllables (including phonemes) as base units, an inventory of domain-specific larger sized subword units, so-called macro-demisyllables (MDS), is created. A quality measure for the automatic decomposition of all single words into subword units is presented which takes into account the trainability of the chosen units. To create the whole inventory an iterative procedure is applied with respect to the predefined quality measure. Each MDS is represented by a dedicated HMM. By tying the densities of specific phonemes, only the number of mixture coefficients and transitions increases in comparison to the original phoneme models. Recogniton experiments within the German Verbmobil evaluation 1996 show that the new simple MDS models are as powerful as standard triphone models, although our MDS models are up to now context-independent.

Full Paper

Bibliographic reference.  Pfau, Thilo / Beham, Manfred / Reichl, W. / Ruske, GŁnther (1997): "Creating large subword units for speech recognition", In EUROSPEECH-1997, 1191-1194.