EUROSPEECH 2001 Scandinavia
This paper describes improvements on the excitation model of an HMM-based text-to-speech system. In our previous work, natural spectral and pitch parameters have been generated from HMM by using a speech parameter generation algorithm. However, synthesized speech has a typical quality of ``vocoded speech'' since the system used a traditional excitation model with either a periodic impulse train or white noise. In this paper, in order to reduce the synthetic quality, a mixed excitation model used in MELP is incorporated into the system. Excitation parameters used in mixed excitation are modeled by HMMs, and generated from HMMs by a parameter generation algorithm in the synthesis phase. The result of a listening test shows that the mixed excitation model significantly improves quality of synthesized speech as compared with the traditional excitation model.
Bibliographic reference. Yoshimura, Takayoshi / Tokuda, Keiichi / Masuko, Takashi / Kobayashi, Takao / Kitamura, Tadashi (2001): "Mixed excitation for HMM-based speech synthesis", In EUROSPEECH-2001, 2263-2266.