What Automatic Speech Recognition Can Tell Us About Stress and Stress Shift in Continuous Speech

Simone de Lemos


I examine lexical stress and stress shift in contexts of stress clash in Brazilian Portuguese (BP) continuous speech data. I start by investigating whether an automatic speech recognition (ASR) toolkit can detect lexical stress using spectral information, as represented by the Mel Frequency Cepstral Coefficients (MFCCs) of stressed and unstressed vowels. The ASR toolkit was trained using a phonetic dictionary where each entry was labeled for primary stress, a list of phones, transcripts, and a language model (LM). The output acoustic model was then used in two test scenarios, where the task of choosing the stressed vowel in a word token was increasingly complex. Results achieved an overall accuracy rate of 92.57% and 80.97% respectively. To investigate stress shift, I use speech data from a production study recorded in Brazil. In the study, speakers where asked to utter syntactically ambiguous sentences using prosody that would cue for one of two possible meanings (and structures). Stress clash would (potentially)be resolved by means of stress shift in one of the structures. Preliminary results showed apparent stress shift in roughly 20% of the contexts identified by a human referee as having the syntactic structure where stress shift would occur.


 DOI: 10.21437/SpeechProsody.2018-199

Cite as: de Lemos, S. (2018) What Automatic Speech Recognition Can Tell Us About Stress and Stress Shift in Continuous Speech. Proc. 9th International Conference on Speech Prosody 2018, 984-988, DOI: 10.21437/SpeechProsody.2018-199.


@inproceedings{de Lemos2018,
  author={Simone {de Lemos}},
  title={What Automatic Speech Recognition Can Tell Us About Stress and Stress Shift in Continuous Speech},
  year=2018,
  booktitle={Proc. 9th International Conference on Speech Prosody 2018},
  pages={984--988},
  doi={10.21437/SpeechProsody.2018-199},
  url={http://dx.doi.org/10.21437/SpeechProsody.2018-199}
}