ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Speech acoustic unit segmentation using hierarchical dirichlet processes

Amir Hossein Harati Nejad Torbati, Joseph Picone, Marc Sobel

Speech recognition systems have historically used contextdependent phones as acoustic units because these units allow linguistic information, such as a pronunciation lexicon, to be leveraged. However, when dealing with a new language for which minimal linguistic resources exist, it is desirable to automatically discover acoustic units. The process of discovering acoustic units usually consists of two stages: segmentation and clustering. In this paper, we focus on the segmentation portion of this problem. We introduce a nonparametric Bayesian approach for segmentation, based on Hierarchical Dirichlet Processes (HDP), in which a hidden Markov model (HMM) with an unbounded number of states is used to segment the utterance. This model is referred to as an HDP-HMM. We compare this algorithm to several popular heuristic methods and demonstrate an 11% improvement in finding boundaries on the TIMIT Corpus. A self-similarity measure over segments shows an 88% improvement compared to manual segmentation with comparable segment length. This work represents the first step in the development of a speech recognition system that is entirely based on nonparametric Bayesian models.

doi: 10.21437/Interspeech.2013-184

Cite as: Torbati, A.H.H.N., Picone, J., Sobel, M. (2013) Speech acoustic unit segmentation using hierarchical dirichlet processes. Proc. Interspeech 2013, 637-641, doi: 10.21437/Interspeech.2013-184

  author={Amir Hossein Harati Nejad Torbati and Joseph Picone and Marc Sobel},
  title={{Speech acoustic unit segmentation using hierarchical dirichlet processes}},
  booktitle={Proc. Interspeech 2013},