First International Conference on Spoken Language Processing (ICSLP 90)
This paper describes preliminary results for a new method of training multi-phone units for discrete hidden Markov model speech recognition systems. The context sensitive, potentially poorly trained multi-phone units are combined with smaller speech units by a weighting scheme favoring well-trained data. We tested this method for the Japanese language, using the multi-phone disyllable (VCV pattern in Japanese) unit and the tripartite disyllable unit. A tripartite disyllable is composed of smaller speech units, a single consonant phone (in the case of Japanese) surrounded by two vowel demiphones (context sensitive half phones). For speaker-dependent isolated-word recognition, and training on data from three recording sessions of the same continuous speech training set, we obtained an average recognition performance of 96.6% for the merged system. This is a 9.1% improvement in the recognition rate over the standard disyllable system, and 0.8% over the tripartite disyllable system.
Bibliographic reference. Goldstein, Jade / Amano, Akio / Murayama, Hideki / Izawa, Mariko / Ichikawa, Akira (1990): "A new training method for multi-phone speech units for use in a hidden Markov model speech recognition system", In ICSLP-1990, 1205-1208.