Auditory-Visual Speech Processing (AVSP'98)
December 4-6, 1998
Today synthetic speech is often based on concatenation of natural speech, i.e. units such as diphones or polyphones are taken from natural speech and are then put together to form any word or sentence. So far there have mainly been two ways of adding a visual modality to such a synthesis: Morphing between single images or concatenating video sequences. In this study, however, a new method is presented where recorded natural movements of points on the face are used to control an animated face.
Bibliographic reference. Hallgren, Asa / Lyberg, Bertil (1998): "Visual speech synthesis with concatenative speech", In AVSP-1998, 181-184.
|av98_181_1.mov (12941 KB)||0045_01.mov||Example of concatenative visual speech synthesis||Video File: QuickTime; 320x240, 25 Hz, 24 bits per pixel, compressed (RLE)|
|av98_181_2.mov (16519 KB)||0045_02.mov||Example of concatenative visual speech synthesis||Video File: QuickTime; 320x240, 25 Hz, 24 bits per pixel, compressed (RLE)|