Intermediate features are not useful for tone perception

Yue Chen, Yi Xu


Many theories assume that speech perception is done by first extracting features like the distinctive features, tonal features or articulatory gestures before recognizing phonetic units such as segments and tones. But it is unclear how exactly extracted features can lead to effective phonetic recognition. In this study we explore this issue by using support vector machine (SVM), a supervised machine learning model, to simulate the recognition of Mandarin tones from F0 in continuous speech. We tested how well a five-level system or a binary distinctive features system can identify Mandarin tones by training the SVM model with F0 trajectories with reduced temporal and frequency resolutions. At full resolution, the recognition rates were 97% and 86% based on the semitone and Hertz scales, respectively. At reduced temporal resolution, there was no clear decline in recognition rate until two points per syllable. At reduced frequency resolution, the recognition rate dropped rapidly: by the level with 5 bands, the accuracy was around 40% based on both Hertz and semitone scales. These results suggest that intermediate featural representations provide no benefit for tone recognition, and are unlikely to be critical for tone perception.


 DOI: 10.21437/SpeechProsody.2020-105

Cite as: Chen, Y., Xu, Y. (2020) Intermediate features are not useful for tone perception. Proc. 10th International Conference on Speech Prosody 2020, 513-517, DOI: 10.21437/SpeechProsody.2020-105.


@inproceedings{Chen2020,
  author={Yue Chen and Yi Xu},
  title={{Intermediate features are not useful for tone perception}},
  year=2020,
  booktitle={Proc. 10th International Conference on Speech Prosody 2020},
  pages={513--517},
  doi={10.21437/SpeechProsody.2020-105},
  url={http://dx.doi.org/10.21437/SpeechProsody.2020-105}
}