Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Analysis of Nonmodal Phonation Using Minimum Entropy Deconvolution

Nicolas Malyska, Thomas F. Quatieri

Massachusetts Institute of Technology, USA

Nonmodal phonation occurs when glottal pulses exhibit non-uniform pulse-to-pulse characteristics such as irregular spacings, amplitudes, and/or shapes. The analysis of regions of such nonmodality has application to automatic speech, speaker, language, and dialect recognition. In this paper, we examine the usefulness of a technique called minimum-entropy deconvolution, or MED [1], for the analysis of pulse events in nonmodal speech. Our study presents evidence for both natural and synthetic speech that MED decomposes nonmodal phonation into a series of sharp pulses and a set of mixed-phase impulse responses. We show that the estimated impulse responses are quantitatively similar to those in our synthesis model. A hybrid method incorporating aspects of both MED and linear prediction is also introduced. We show preliminary evidence that the hybrid method has benefit over MED alone for composite impulse-response estimation by being more robust to short-time windowing effects as well as a speech aspiration noise component.

Full Paper

Bibliographic reference.  Malyska, Nicolas / Quatieri, Thomas F. (2006): "Analysis of nonmodal phonation using minimum entropy deconvolution", In INTERSPEECH-2006, paper 1807-Wed2A1O.3.