First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

Speaker Adaptable Phoneme Recognition Selecting Reliable Acoustic Features based on Mutual Information

Katsuhiko Shirai, N. Hosaka, E. Kitagawa, T. Endou

Dept. of Electrical Engineering, Waseda University, Tokyo, Japan

In this paper, a statistical method to recognize phoneme in continuous speech is presented. Three aspects of the system are discussed. The first problem is speaker adaptation to improve the recognition rate. The second is about the effective calculation of phoneme likelihood, especially for consonants in various phoneme environments. The third is an algorithm to modify a label and to get final phoneme decision from a frame label using a duration and phoneme sequence rule. In the frame level, with a set of 100 words, the correct recognition rate of all phoneme categories reaches to 80.42% in multi-speaker experiment (6 males) and 74.76% in completely speaker independent experiment. And we will show the results obtained from ATR data base with a set of 3271 words of single speaker data in which 1370 words are used for training and 1901 words are used for recognition test.

Full Paper

Bibliographic reference.  Shirai, Katsuhiko / Hosaka, N. / Kitagawa, E. / Endou, T. (1990): "Speaker adaptable phoneme recognition selecting reliable acoustic features based on mutual information", In ICSLP-1990, 353-356.