Investigating Spectral Amplitude Modulation Phase Hierarchy Features in Speech Synthesis

Alexandros Lazaridis, Milos Cernak, Pierre-Edouard Honnet, Philip N. Garner


In our recent work, a novel speech synthesis with enhanced prosody (SSEP) system using probabilistic amplitude demodulation (PAD) features was introduced. These features were used to improve prosody in speech synthesis. The PAD was applied iteratively for generating syllable and stress amplitude modulations in a cascade manner. The PAD features were used as a secondary input scheme along with the standard text-based input features in deep neural network (DNN) speech synthesis. Objective and subjective evaluation validated the improvement of the quality of the synthesized speech. In this paper, a spectral amplitude modulation phase hierarchy (S-AMPH) technique is used in a similar to the PAD speech synthesis scheme, way. Instead of the two modulations used in PAD case, three modulations, i.e., stress-, syllable- and phoneme-level ones (2, 5 and 20 Hz respectively) are implemented with the S-AMPH model. The objective evaluation has shown that the proposed system using the S-AMPH features improved synthetic speech quality in respect to the system using the PAD features; in terms of relative reduction in mel-cepstral distortion (MCD) by approximately 9% and in terms of relative reduction in root mean square error (RMSE) of the fundamental frequency (F0) by approximately 25%. Multi-task training is also investigated in this work, giving no statistically significant improvements.


DOI: 10.21437/SSW.2016-6

Cite as

Lazaridis, A., Cernak, M., Honnet, P., Garner, P.N. (2016) Investigating Spectral Amplitude Modulation Phase Hierarchy Features in Speech Synthesis. Proc. 9th ISCA Speech Synthesis Workshop, 32-37.

Bibtex
@inproceedings{Lazaridis+2016,
author={Alexandros Lazaridis and Milos Cernak and Pierre-Edouard Honnet and Philip N. Garner},
title={Investigating Spectral Amplitude Modulation Phase Hierarchy Features in Speech Synthesis},
year=2016,
booktitle={9th ISCA Speech Synthesis Workshop},
doi={10.21437/SSW.2016-6},
url={http://dx.doi.org/10.21437/SSW.2016-6},
pages={32--37}
}