4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
Modern speech synthesis systems with very high intelligibility are readily available in a number of languages. However, the output from all present systems is still readily identifiable as being machine-generated - the output does not sound "natural". One aspect of naturalness is the variability introduced by the emotional state of the speaker, and related pragmatic effects; no current commercial systems include such variation. Comparatively little work has been done to investigate how a speakerís emotional state creates variation in the speech signal, and this work has traditionally been performed by psychologists and has remained distinct from mainstream speech science. Current research suggests that there will be considerable effort involved in producing any accurate description of pragmatic variations in speech, but there has recently been increasing interest in this area due to potential applications in many branches of speech technology. This paper describes a prototype system which has been constructed to simulate emotion in speech synthesized by rule. The system is based on emotion information from the literature, and it simulates a range of emotions using a commercial synthesiser. The use of emotion models and their applicability in the area of speech technology is discussed. The limitations of our current knowledge in the area of vocal emotion are discussed, and suggestions are presented for future research in this area.
Sound Examples: #01 #02 #03 #04 #05 #06 #07 #08 #09 #10 #11 #12
Bibliographic reference. Murray, Iain R. / Arnott, John L. (1996): "Synthesizing emotions in speech: is it time to get excited?", In ICSLP-1996, 1816-1819.