4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Stochastic Perceptual Speech Models with Durational Dependence

Jeff Bilmes (1,2), Nelson Morgan (2), Su-Lin Wu (1,2), Hervé Bourlard (2,3)

(1) International Computer Science Institute (ICSI), Berkeley, CA, USA
(2) Dept. of Computer Science, U. of California, Berkeley, USA
(3) Faculte Polytechnique de Mons, Belgium

In [6], we develop statistical model of speech recognition where emphasis is placed on the perceptually-relevant and information-rich portion of the speech signal. In that model, speech is viewed as a sequence of elementary decisions or Auditory Events (avents) that are made in response to loci of significant spectral change. These decision points are interleaved with periods during which insufficient information has been accumulated to make the next decision. We have called this a Stochastic Perceptual Avent Model, or SPAM. In the work reported here, we have extended our initial experimental implementation [7] to include other probabilistic dependencies specified in the original theory, particularly the dependence on the time from the current frame back to the previous hypothesized avent.

Full Paper

Bibliographic reference.  Bilmes, Jeff / Morgan, Nelson / Wu, Su-Lin / Bourlard, Hervé (1996): "Stochastic perceptual speech models with durational dependence", In ICSLP-1996, 1301-1304.