Non-filter waveform generation from cepstrum using spectral phase reconstruction

Yasuhiro Hamada, Nobutaka Ono, Shigeki Sagayama

This paper discusses non-filter waveform generation from cepstral features using spectral phase reconstruction as an alternative method to replace the conventional source-filter model in text-to-speech (TTS) systems. As the primary purpose of the use of filters is considered as producing a waveform from the desired spectrum shape, one possible alternative of the sourcefilter framework is to directly convert the designed spectrum into a waveform by utilizing a recently developed “ phase reconstruction ”from the power spectrogram. Given cepstral features and fundamental frequency (F0 ) as desired spectrum from a TTS system, the spectrum to be heard by the listener is calculated by converting the cepstral features into a linear-scale power spectrum and multiplying with the pitch structure of F0 . The signal waveform is generated from the power spectrogram by spectral phase reconstruction. An advantageous property of the proposed method is that it is free from undesired amplitude and long time decay often caused by sharp resonances in recursive filters. In preliminary experiments, we compared temporal and gain characteristics of the synthesized speech using the proposed method and mel-log spectrum approximation (MLSA) filter. Results show the proposed method performed better than the MLSA filter in the both characteristics of the synthesized speech, and imply a desirable properties of the proposed method for speech synthesis.

DOI: 10.21437/SSW.2016-5

Cite as

Hamada, Y., Ono, N., Sagayama, S. (2016) Non-filter waveform generation from cepstrum using spectral phase reconstruction. Proc. 9th ISCA Speech Synthesis Workshop, 27-31.

author={Yasuhiro Hamada and Nobutaka Ono and Shigeki Sagayama},
title={Non-filter waveform generation from cepstrum using spectral phase reconstruction},
booktitle={9th ISCA Speech Synthesis Workshop},