First European Conference on Speech Communication and Technology

Paris, France
September 27-29, 1989

Auditory-Based Filter-Bank Analysis as a Front-End Processor for Speech Recognition

Hiroshi Hamada (1), Tatsuya Hirahara (2), Akihiro Imamura (1), Tatsuo Matsuoka (1), Ryohei Nakatsu (1)

(1) NTT Human Interface Laboratories, Take, Yokosuka-shi, Kanagawa-ken, Japan
(2) ATR Auditory and Visual Perception Research Laboratories, Inuidani, Seika-cho, Kyoto, Japan

A comparison of speech analysis based on human auditory processing and conventional LPC analysis is described. A comparison was made of the capabilities of these two types of parameters to recognize fourteen consonants extracted from Japanese consonant-vowel (CV) syllables spoken in isolation. Tree types of recognition algorithms were used: Dynamic time-warping with multiple template sets, hidden Markov models, and neural networks. The auditory system consisted of 35 channels, spanning from 100 to 5400 Hz, each of which consisted of a critical bandpass filtering process, a rectification process, an integration process, and a transformation into logarithmic form. A lateral inhibition process was also included in order to more closely simulate human auditory processing. The recognition experiments showed that parameters based on the features of human auditory processing are excellent for use in various types of speech recognition methods.

Full Paper

Bibliographic reference.  Hamada, Hiroshi / Hirahara, Tatsuya / Imamura, Akihiro / Matsuoka, Tatsuo / Nakatsu, Ryohei (1989): "Auditory-based filter-bank analysis as a front-end processor for speech recognition", In EUROSPEECH-1989, 2396-2399.