Welcome to ISCA Web ...
Speech Synthesis
Introduction
Speech Synthesis has a long history: There is a good introduction by Andrew Maas in the May 17 Lecture of the Stanford Spoken Language Processing course
Simon King's Course on Speech Synthesis is a comprehensive multimedia presentation.
ISCA's Speech Synthesis special Interest Group SynSIG compiles lists of software tools and educational materials.
Most attention has been paid to ‘Text-to-Speech’ applications, i.e. type in the words you want and have them spoken for you.
Speech synthesis systems are evaluated in terms of intelligibility (how many words are correctly identified by listeners?) and naturalness (to what extent does the synthesis resemble a normal human voice?).
Visual overview of SCOOT Synthesis Topics.
Text to Speech
Synthesis from written text (orthography) involves 2 stages:
- · Text Analysis – transform the text into an intermediate representation
- · Waveform Generation – render the speech acoustics from the intermediate representation
To a large extent these stages are independent.
For an introduction to Text Analysis, see
- the May 17 Lecture of The Stanford course by Andrew Maas
- this introduction to text analysis in Edinburgh's speech.zone
- this introduction to pronunciation and prosody in Edinburgh's speech.zone
- Kim Silverman’s 2013 talk at ICSI
Waveform generation methods subdivide into
- Classic Speech Synthesis (vocal tract simulation),
- Concatenative Synthesis,
- HMM-based Synthesis and
- DNN Synthesis. Synthesis based on Neural Nets
Synthesis Toolkits
Provide software for generating synthetic speech