ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Phone duration modeling using clustering of rich contexts

Tanel Alumäe, Rena Nemoto

This paper describes a phone duration model applied to speech recognition. The model is based on a decision tree that finds clusters of phones in various contexts that tend to have similar durations. Wide contexts with rich linguistic and phonetic features are used. To better model varying and non-stationary speaking rates, the contextual features also include the observed duration values of previous phones. For each resulting phone cluster, a log-normal distribution of duration is estimated. The resulting decision tree and the log-normal distributions are used to calculate likelihoods of phone durations in N-best lists. Experiments on two Estonian recognition tasks show a small but significant improvement in speech recognition accuracy.

doi: 10.21437/Interspeech.2013-445

Cite as: Alumäe, T., Nemoto, R. (2013) Phone duration modeling using clustering of rich contexts. Proc. Interspeech 2013, 1801-1805, doi: 10.21437/Interspeech.2013-445

  author={Tanel Alumäe and Rena Nemoto},
  title={{Phone duration modeling using clustering of rich contexts}},
  booktitle={Proc. Interspeech 2013},