A system for rule based audiovisual text-to-speech synthesis has been created. The system is based on the KTH text-to-speech system which has been complemented with a three-dimensional parameterized model of a human face. The face can be animated in real time, synchronized with the auditory speech. The facial model is controlled by the same synthesis software as the auditory speech synthesizer. A set of rules that takes coarticulation into account has been developed. The audiovisual text-to-speech system has also been incorporated into a spoken man-machine dialogue system that is being developed at the department.
Bibliographic reference. Beskow, Jonas (1995): "Rule-based visual speech synthesis", In EUROSPEECH-1995, 299-302.