EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Time and Memory Efficient Viterbi Decoding for LVCSR using a Precompiled Search Network

Daniel Willett, Erik McDermott, Yasuhiro Minami, Shigeru Katagiri

NTT Communication Science Laboratories, Japan

In this paper, we present our recently developed time-synchronous speech recognition decoder, which adopts the idea of representing the search space of Large Vocabulary Continuous Speech Recognition (LVCSR) in a single precompiled network. In particular, we outline our approaches for time and memory efficient Viterbi decoding in this scenario. This includes reducing the fast memory needs by keeping the search network on disk and only loading the required parts on demand. Evaluations are carried out on a difficult Japanese LVCSR task which involves a back-off trigram language model and full cross-word dependent triphone acoustic models. Time and memory efficiency enables the real-time Viterbi decoding of entire lecture speeches in a single time-synchronous pass with a search error of less than 1%.

Full Paper

Bibliographic reference.  Willett, Daniel / McDermott, Erik / Minami, Yasuhiro / Katagiri, Shigeru (2001): "Time and memory efficient viterbi decoding for LVCSR using a precompiled search network", In EUROSPEECH-2001, 847-850.