Third ESCA/COCOSDA Workshop on Speech Synthesis
November 26-29, 1998
This paper describes the application of the Harmonic plus Noise Model, HNM, for concatenative Text-to-Speech (TTS) synthesis. In the context of HNM, speech signals are represented as a time-varying harmonic component plus a modulated noise component. The decomposition of speech signal in these two components allows for more natural-sounding modifications (e.g., source and filter modifications) of the signal. The parametric representation of speech using HNM provides a straightforward way of smoothing discontinuities of acoustic units around concatenation points. Formal listening tests have shown that HNM provides high-quality speech synthesis while outperforming other models for synthesis (e.g., TD-PSOLA) in intelligibility, naturalness and pleasantness.
Bibliographic reference. Stylianou, Yannis (1998): "Concatenative Speech Synthesis using a Harmonic plus Noise Model", In SSW3-1998, 261-266.