First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

Auditory Spectrograms in HMM Phoneme Recognition

Tatsuya Hirahara, Hitoshi Iwamida

ATR Auditory and Visual Perception Research Laboratories, Kyoto, Japan

Several auditory spectrograms based on the adaptive Q cochlear filter and its relatives are compared in speaker dependent HMM phoneme recognition tests using clean speech, as well as speech degraded by adding pink noise. These spectrograms are created using a filter banks an inner hair cell (IHC) model and a lateral inhibition (LINH) circuit, in different combinations. Eight different filter banks of three different types of filter are prepared: (1) a simple band pass filter with Qb=4.5 and 30, (2) a conventional fixed Q cochlear filter with Qb-4.5 and 30, and (3) an adaptive Q cochlear filter with feedback/feedforward control with short/long adaptation time constant. Each filter bank is composed of 55 channel filters spaced by 1/3 Bark and spanning the frequency range from 1 to 18.7 Bark. The IHC model involves a saturated half wave rectifier and a short term adaptation circuit. The recognition task is to classify input tokens into 18 phoneme categories using 5,788 training tokens and 5,773 testing tokens. Results are as follows; (1) The adaptive Q cochlear filter with LINH gives better recognition performances than the other types of filter banks in all training/testing conditions. (2) The LINH effectively improves recognition performance. (3) The IHC model produces no benefit even for the noisy data set.

Full Paper

Bibliographic reference.  Hirahara, Tatsuya / Iwamida, Hitoshi (1990): "Auditory spectrograms in HMM phoneme recognition", In ICSLP-1990, 381-384.