Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis

Hideki Kawahara, Yannis Agiomyrgiannakis, Heiga Zen


This paper introduces a general and flexible framework for F0 and aperiodicity (additive non periodic component) analysis, specifically intended for high-quality speech synthesis and modification applications. The proposed framework consists of three subsystems: instantaneous frequency estimator and initial aperiodicity detector, F0 trajectory tracker, and F0 refinement and aperiodicity extractor. A preliminary implementation of the proposed framework substantially outperformed (by a factor of 10 in terms of RMS F0 estimation error) existing F0 extractors in tracking ability of temporally varying F0 trajectories. The front end aperiodicity detector consists of a complex-valued wavelet analysis filter with a highly selective temporal and spectral envelope. This front end aperiodicity detector uses a new measure that quantifies the deviation from periodicity. The measure is less sensitive to slow FM and AM and closely correlates with the signal to noise ratio. The front end combines instantaneous frequency information over a set of filter outputs using the measure to yield an observation probability map. The second stage generates the initial F0 trajectory using this map and signal power information. The final stage uses the deviation measure of each harmonic component and F0 adaptive time warping to refine the F0 estimate and aperiodicity estimation. The proposed framework is flexible to integrate other sources of instantaneous frequency when they provide relevant information.


DOI: 10.21437/SSW.2016-36

Cite as

Kawahara, H., Agiomyrgiannakis, Y., Zen, H. (2016) Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis. Proc. 9th ISCA Speech Synthesis Workshop, 221-228.

Bibtex
@inproceedings{Kawahara+2016,
author={Hideki Kawahara and Yannis Agiomyrgiannakis and Heiga Zen},
title={Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis},
year=2016,
booktitle={9th ISCA Speech Synthesis Workshop},
doi={10.21437/SSW.2016-36},
url={http://dx.doi.org/10.21437/SSW.2016-36},
pages={221--228}
}