13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Pitch-Scaled Analysis based Residual Reconstruction for Speech Analysis and Synthesis

Zhengqi Wen (1), Hideki Kawahara (2), Jianhua Tao (1)

(1) National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
(2) Faculty of Systems Engineering, Wakayama University, Wakayama, Japan

The typical problem in LPC-like vocoder is buzzing sound which is mainly due to the simple pulse train or noise excitation model. One way to improve it is to reconstruct the residual obtained from inverse filtering. So a new parametric representation of speech based on pitch-scaled analysis is proposed in this paper. Pitch-scaled analysis is used to extract the periodic spectrum of residual with half pitch period length. Then these periodic spectrums are decorrelated by principal component analysis (PCA) to reduce their dimension. Aperiodic measure is defined as the harmonic-to-noise ratio in the frequency domain where voicing cut-off frequency (VCO) is used to control the smoothness of aperiodicity. Periodic spectrum and aperiodic measure together with F0 are indicated as excitation parameters in the proposed LPC vocoder. Experimental results show that this proposed vocoder can get a mean opinion score (MOS) of 4.1 for a female voice before dimensionality reduction and keep the high-quality property after parameter compression.

Index Terms: speech parametric representation, pitch-scaled analysis, voicing cut-off frequency, principal component analysis

Full Paper

Bibliographic reference.  Wen, Zhengqi / Kawahara, Hideki / Tao, Jianhua (2012): "Pitch-scaled analysis based residual reconstruction for speech analysis and synthesis", In INTERSPEECH-2012, 374-377.