Physiological pitch range estimation from a brief speech input: A study on a bilingual parallel speech corpus

Wei Zhang, Yanlu Xie, Jinsong Zhang


The range of pitch that a speaker can maximally produce is constrained by the physiological characteristics. Unlike the speaking pitch range in speech samples, this kind of ‘physiological pitch range’ is independent of the content of the speech, but can be partly estimated by human listeners from even a brief speech, employing not only the fundamental frequency (F0) but also the spectral features. In our previous work, we proposed a spectrum-based algorithm for estimating physiological pitch range from a brief speech, which outperformed the traditional F0 analysis method when the speech input was as short as 300ms. The present study continued to test the algorithm on a Japanese-Chinese parallel speech corpus uttered by a group of native speakers of Japanese who spoke Mandarin as a second language. For each speaker, the proposed algorithm obtained almost the same pitch range from his/her L1 and L2 speech data, whereas the traditional method gave two estimations with a larger difference. The results verified that the proposed algorithm was more capable of estimating a speaker’s physiological pitch range from a brief speech.


 DOI: 10.21437/SpeechProsody.2020-196

Cite as: Zhang, W., Xie, Y., Zhang, J. (2020) Physiological pitch range estimation from a brief speech input: A study on a bilingual parallel speech corpus. Proc. 10th International Conference on Speech Prosody 2020, 960-964, DOI: 10.21437/SpeechProsody.2020-196.


@inproceedings{Zhang2020,
  author={Wei Zhang and Yanlu Xie and Jinsong Zhang},
  title={{Physiological pitch range estimation from a brief speech input: A study on a bilingual parallel speech corpus}},
  year=2020,
  booktitle={Proc. 10th International Conference on Speech Prosody 2020},
  pages={960--964},
  doi={10.21437/SpeechProsody.2020-196},
  url={http://dx.doi.org/10.21437/SpeechProsody.2020-196}
}