FAAVSP - The 1st Joint Conference on Facial Analysis, Animation, and
Auditory-Visual Speech Processing

Vienna, Austria
September 11-13, 2015

Comparison of Dialect Models and Phone Mappings in HSMM-Based Visual Dialect Speech Synthesis

Dietmar Schabus, Michael Pucher

FTW Telecommunications Research Center Vienna, Austria

In this paper we evaluate two different methods for the visual synthesis of Austrian German dialects with parametric Hidden- Semi-Markov-Model (HSMM) based speech synthesis. One method uses visual dialect data, i.e. visual dialect recordings that are annotated with dialect phonetic labels, the other methods uses a standard visual model and maps dialect phones to standard phones. This second method is more easily applicable since most often visual dialect data is not available. Both methods employ contextual information via decision tree based visual clustering of dialect or standard visual data. We show that both models achieve a similar performance on a subjective pair-wise comparison test. This shows that visual dialect data is not necessarily needed for visual modeling of dialects if a dialect to standard mapping can be used that exploits the contextual information of the standard language. Index Terms: visual speech synthesis, dialect

Full Paper

Bibliographic reference.  Schabus, Dietmar / Pucher, Michael (2015): "Comparison of dialect models and phone mappings in HSMM-based visual dialect speech synthesis", In FAAVSP-2015, 84-87.