13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Amplitude Spectrum Based Excitation Model for HMM-based Speech Synthesis

Zhengqi Wen, Jianhua Tao

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Beijing, China

This paper describes an excitation model based on amplitude spectrum for HMM-based speech synthesis system (HTS). Residual signal obtained from inverse filtering is decomposed into periodic and aperiodic spectrums in frequency domain. Amplitude spectrum of half pitch period length is reserved as periodic component in synthesis stage and zero-phase criterion and pitch synchronous overlap add method (PSOLA) are adopted to reconstruct the residual signal. Before integrating this excitation model into HTS, these periodic spectrums are normalized and Linde-Buzo-Gray (LBG) algorithm is adopted to construct codebooks for every Mandarin final . Then index parameters from these codebooks which are indicated as excitation information are taken into HTS training together with spectral, F0 and aperiodic parameters. Listening test showed that for female voice the analysis-synthesis result of the vocoder based on proposed excitation model is comparable with that of STRAIGHT and when integrating into HTS, the quality of generated speech is also improved.

Index Terms: speech synthesis, HMM-based speech synthesis, excitation model, amplitude spectrum

Full Paper

Bibliographic reference.  Wen, Zhengqi / Tao, Jianhua (2012): "Amplitude spectrum based excitation model for HMM-based speech synthesis", In INTERSPEECH-2012, 1428-1431.