EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Mixed Excitation for HMM-based Speech Synthesis

Takayoshi Yoshimura (1), Keiichi Tokuda (1), Takashi Masuko (2), Takao Kobayashi (2), Tadashi Kitamura (1)

(1) Nagoya Institute of Technology, Japan
(2) Tokyo Institute of Technology, Japan

This paper describes improvements on the excitation model of an HMM-based text-to-speech system. In our previous work, natural spectral and pitch parameters have been generated from HMM by using a speech parameter generation algorithm. However, synthesized speech has a typical quality of ``vocoded speech'' since the system used a traditional excitation model with either a periodic impulse train or white noise. In this paper, in order to reduce the synthetic quality, a mixed excitation model used in MELP is incorporated into the system. Excitation parameters used in mixed excitation are modeled by HMMs, and generated from HMMs by a parameter generation algorithm in the synthesis phase. The result of a listening test shows that the mixed excitation model significantly improves quality of synthesized speech as compared with the traditional excitation model.

Bibliographic reference.  Yoshimura, Takayoshi / Tokuda, Keiichi / Masuko, Takashi / Kobayashi, Takao / Kitamura, Tadashi (2001): "Mixed excitation for HMM-based speech synthesis", In EUROSPEECH-2001, 2263-2266.