Can we model pitch using only the f0 on sonorant rimes?

Daniel Hirst, Ting Wang


Modelling pitch patterns from acoustic data needs to take into account the fact that raw f0 curves are the product of an underlying global pitch pattern and a more local (micromelodic) influence of the individual speech sounds. This suggests the hypothesis that pitch could be modelled using only the f0 detected on sonorant rimes (vowels and sonorant codas). This paper describes an experiment to test the hypothesis. The test used recordings and native speakers of Mandarin Chinese, assuming that evaluating synthetic prosody in a tone language would be a less metalinguistic task than in a language with no lexical tones. After applying an automatic alignment algorithm to the recordings two versions of resynthesis were created: in the first, only the f0 on sonorant rimes was used for the model. In the second the complete f0 curve was used. In both versions the f0 was modeled using the Momel algorithm. The recordings were then evaluated by 10 native speakers of Mandarin Chinese. Contrary to our hypothesis, the version using only the f0 detected on sonorant rimes was evaluated as significantly much worse than the standard method of using the whole f0 curve. A number of reasons for this difference are discussed.


 DOI: 10.21437/SpeechProsody.2018-135

Cite as: Hirst, D., Wang, T. (2018) Can we model pitch using only the f0 on sonorant rimes?. Proc. 9th International Conference on Speech Prosody 2018, 666-670, DOI: 10.21437/SpeechProsody.2018-135.


@inproceedings{Hirst2018,
  author={Daniel Hirst and Ting Wang},
  title={Can we model pitch using only the f0 on sonorant rimes?},
  year=2018,
  booktitle={Proc. 9th International Conference on Speech Prosody 2018},
  pages={666--670},
  doi={10.21437/SpeechProsody.2018-135},
  url={http://dx.doi.org/10.21437/SpeechProsody.2018-135}
}