Second International Conference on Spoken Language Processing (ICSLP'92)

Banff, Alberta, Canada
October 13-16, 1992

Spectral Mapping onto Probabilistic Domain Using Neural Networks and Its Application to Speaker Adaptive Phoneme Recognition

Tetsunori Kobayashi, Katsuhiko Shirai

Department of Electrical Engineering, Waseda University, Tokyo, Japan

A feature parameter space called PRPG (Probability Ratios between Phoneme Group pairs) is utilized for speaker adaptive phoneme recognition. The coordinate conversion is performed by neural networks. Each outputnode of the network represents a posteriori probability of phoneme group. Therefore, distance in the PRPG coordinate system corresponds directly to the difference of likelihood. The area with the same information for speech recognition is compressed into one point. Moreover, by the definition of the coordinate system, the meaning of axes are equivalent among different speakers, so the speaker adaptation can be easily performed without trajectory mapping. The experimental results show that the scores of the speaker-adaptive recognition in the PRPG domain are always superior to those of the speaker-dependent recognition in the spectral domain.

Full Paper

Bibliographic reference.  Kobayashi, Tetsunori / Shirai, Katsuhiko (1992): "Spectral mapping onto probabilistic domain using neural networks and its application to speaker adaptive phoneme recognition", In ICSLP-1992, 385-388.