First International Conference on Spoken Language Processing (ICSLP 90)
This paper describes an HMM-based speaker-independent word spotting system and its Transputer-based implementation. The candidates of word end-points and the corresponding likelihood scores are computed with the continuous Viterbi decoding algorithm. To prune unreasonable candidates, a new duration control method, a threshold logic for the likelihood scores and a new local peak detection method are proposed. An efficient parallel processing scheme for the word spotting system is carried out by using a tree structure of Transputers. In each frame period, the spectral feature vector from the speech analyzer is broadcasted from the root Transputer (Processing Master: PM) to the node Transputers (Processing Element : PE). Each PE performs the continuous Viterbi decoding and the pruning of candidates in parallel, and the spotting results are returned to PM. With 8 PEs in a tree structure, 72 words can be processed within a 12msec frame period. Word detection experiments, using the 10 Japanese digits spoken over a noisy telephone network, yield a word detection accuracy of 97%.
Bibliographic reference. Imamura, Akihiro / Suzuki, Yoshitake (1990): "Speaker-independent word spotting and a transputer-based implementation", In ICSLP-1990, 537-540.