Discriminative Method to Extract Coarse Prosodic Structure and its Application for Statistical Phrase/Accent Command Estimation

Yuma Shirahata, Daisuke Saito, Nobuaki Minematsu


This paper introduces a method of extracting coarse prosodic structure from fundamental frequency (F0) contours by using a discriminative approach such as deep neural networks (DNN), and applies the method for the parameter estimation of the Fujisaki model. In the conventional methods for the parameter estimation of the Fujisaki model, generative approaches, in which the estimation is treated as an inverse problem of the generation process, have been adopted. On the other hand, recent development of the discriminative approaches would enable us to treat the problem in a direct manner. To introduce a discriminative approach to the parameter estimation of the Fujisaki model in which the precise labels for the parameter are expensive, this study focuses on the similarities between the acoustic realization of the prosodic structure in F0 contours and the sentence structure of the read text. In the proposed method, the sentence structure obtained from the text is utilized as the labels for the discriminative model, and the model estimates the coarse prosodic structure. Finally this structure is refined by using a conventional method for the parameter estimation. Experimental results demonstrate that the proposed method improves the estimation accuracy by 18% in terms of detection rate without using any auxiliary features at inference.


 DOI: 10.21437/Interspeech.2020-2566

Cite as: Shirahata, Y., Saito, D., Minematsu, N. (2020) Discriminative Method to Extract Coarse Prosodic Structure and its Application for Statistical Phrase/Accent Command Estimation. Proc. Interspeech 2020, 4427-4431, DOI: 10.21437/Interspeech.2020-2566.


@inproceedings{Shirahata2020,
  author={Yuma Shirahata and Daisuke Saito and Nobuaki Minematsu},
  title={{Discriminative Method to Extract Coarse Prosodic Structure and its Application for Statistical Phrase/Accent Command Estimation}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4427--4431},
  doi={10.21437/Interspeech.2020-2566},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2566}
}