Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
In this paper, we describe a new HMM based multiphone method developed to reduce training data and training time as well as to restrain the effect of contextual holes. We define a multiphone as all possible phoneme combinations which consist of less than 4 phonemes. For the purpose of reducing the number of units, we train the most high-frequency-in-use multiphones instead of training all the multiphones. Recognition results are obtained by applying our method to the Japanese Common Speech Data Corpus. The results from the training-vocabulary show that our method achieves the same recognition accuracy as that of triphone HMM's. For the non-training vocabulary, we demonstrate that our method, compared to triphone method, reduces the error rate by as much as 70%. We also propose a two-stage search algorithm based on a pre-selection step and a detailed A* search. We show that compared to the Viterbi beam-search, the two-stage search algorithm just takes 70% of the computing time without reducing the recognition accuracy.
Bibliographic reference. Yi, Jie / Miki, Kei (1992): "A new method of speaker-independent speech recognition using multiphone HMM", In ICSLP-1992, 1471-1474.