First International Conference on Spoken Language Processing (ICSLP 90)
This paper proposes mappings of supervisory signals in layered neural networks for lip-reading the five Japanese vowels with the aim of enhancing recognition. The feature parameters of the width P\ and height Pi of the lip shape and the distance P3 between the top of the upper lip and the bottom of the jaw are selected. Mappings from the input vector with the three feature parameters to the desired output vectors with various supervisory signals are discussed on the basis of the similarity between the pentagonal distribution of the F1-F2 formant diagram and the P1-P2(P1-P3) diagram. As a result of speaker-dependent lip reading experiments using twenty test sets of five vowels, the recognition rates of the mappings of supervisory signals based on the spatial relationship between vowels are several percent higher than the rates of the mappings disregarding the relationship. Finally, a mapping for generating the desired relationship between vowels in the hidden layer is proposed, and the effectiveness of the mapping is demonstrated.
Bibliographic reference. Watanabe, Tomio / Kohda, Masaki (1990): "Lip-reading of Japanese vowels using neural networks", In ICSLP-1990, 1373-1376.