VOP Detection in Variable Speech Rate Condition

Ayush Agarwal, Jagabandhu Mishra, S.R. Mahadeva Prasanna


The Vowel onset point (VOP) is the location where the onset of vowel takes place in a given speech segment. Many speech processing applications need the information of VOP to extract features from the speech signal. In such cases the overall performance largely depends on the exact detection of VOP location. There are many algorithms proposed in the literature for the automatic detection of VOPs. Most of these methods assume that the given speech signal is produced at normal speech rate. All the parameters for smoothing speech signal evidence as well as hypothesizing VOPs are set accordingly. However, these parameter settings may not work well for variable speech rate conditions. This work proposes a dynamic first order Gaussian differentiator (FOGD) window based approach to overcome this issue. The proposed approach is evaluated using a subset of TIMIT dataset with manually marked ground truth VOPs. The evaluated performance of VOP detection by using the proposed approach shows improvement when compared with the existing approach at higher and lower speech rate conditions.


 DOI: 10.21437/Interspeech.2020-2326

Cite as: Agarwal, A., Mishra, J., Prasanna, S.M. (2020) VOP Detection in Variable Speech Rate Condition. Proc. Interspeech 2020, 3690-3694, DOI: 10.21437/Interspeech.2020-2326.


@inproceedings{Agarwal2020,
  author={Ayush Agarwal and Jagabandhu Mishra and S.R. Mahadeva Prasanna},
  title={{VOP Detection in Variable Speech Rate Condition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3690--3694},
  doi={10.21437/Interspeech.2020-2326},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2326}
}