Second European Conference on Speech Communication and Technology

Genova, Italy
September 24-26, 1991


A Matrix Representation of HMM-Based Speech Recognition Algorithms

Shigeki Sagayama

ATR Interpreting Telephony Research Laboratories, Kyoto, Japan

This paper describes a matrix representation from which we can derive a new formulation of HMM-based speech recognition algorithms. This idea provides not only an alternative mathematical formulation equivalent to conventional trellis and Viterbi algorithms but also better understanding of HMM algorithms under grammatical constraints as well as more efficient computational possibilities. In this formulation, a likelihood matrix is defined by an (N + 1) x (N + 1) dimensional upper triangular matrix whose (t,s) component is the observation likelihood of the given signal in a time span between t + 1 and s. First, it is shown that the likelihood matrix for a pair of serially connected signal sources is the product of matrices (P = P1P2) and the parallel connection is represented by the sum (P = Pi + P2) From these basic properties, matrix-based HMM computation al- gorithms are derived. Explicit duration control at all levels, such as state, phoneme, syllable, and word, can be easily done. Grammatical rewriting rules are directly interpreted as matrix operations. A matrix parser is suggested for generalization of a CYK parser. This algorithm is particularly effective in large vocabulary systems where same phone units (phonemes) appear in many syntactic paths.

Full Paper

Bibliographic reference.  Sagayama, Shigeki (1991): "A matrix representation of HMM-based speech recognition algorithms", In EUROSPEECH-1991, 1225-1228.