4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Synthesis of Stressed Speech from Isolated Neutral Speech Using HMM-based Models

Sahar E. Bou-Ghazale, John H. L. Hansen

Robust Speech Processing Laboratory, Duke University Department of Electrical and Computer Engineering, Durham, NC, USA

In this study, a novel approach is proposed for modeling speech parameter variations between neutral and stressed conditions and employed in a technique for stressed speech synthesis. The proposed method consists of modeling the variations in pitch contour, voiced speech duration, and average spectral structure using Hidden Markov Models (HMMs). While HMMs have traditionally been used for recognition applications, here they are used to statistically model characteristics needed for generating pitch contour and spectral slope patterns to modify the speaking style of isolated neutral words. An algorithm is developed based on an analysis-synthesis speech model, and HMM pitch and spectral stress characteristics for stress perturbation. Informal listener evaluations of the stress modified speech confirm the HMMs ability to capture the parameter variations under stressed conditions. The proposed HMM models are both speaker and word-independent, but unique to each speaking style. While the modeling scheme is applicable to a variety of stress and emotional speaking styles, the evaluations presented in this study focus on angry, Lombard effect, and loud spoken speech.

