Speech Prosody 2010

Chicago, IL, USA
May 10-14, 2010

Toward Improved HMM-Based Speech Synthesis Using High-Level Syntactical Features

Nicolas Obin (1), Pierre Lanchantin (1), Mathieu Avanzi (2,3), Anne Lacheret-Dujour (3), Xavier Rodet (1)

(1) Analysis-Synthesis Team, IRCAM, Paris, France
(2) Neuchâtel University, Neuchâtel, Switzerland
(3) Modyco Lab., Paris-Ouest University, Nanterres, France

A major drawback of current Hidden Markov Model (HMM)-based speech synthesis is the monotony of the generated speech which is closely related to the monotony of the generated prosody. Complementary to model-oriented approaches that aim to increase the prosodic variability by reducing the ”over-smoothing” effect, this paper presents a linguistic-oriented approach in which high-level linguistic features are extracted from text in order to improve prosody modeling. A linguistic processing chain based on linguistic preprocessing, morpho-syntactical labeling, and syntactical parsing is used to extract high-level syntactical features from an input text. Such linguistic features are then introduced into a HMM-based speech synthesis system to model prosodic variations (f0, duration, and spectral variations). Subjective evaluation reveals that the proposed approach significantly improve speech synthesis compared to a baseline model, event if such improvement depends on the observed linguistic phenomenon.

Index Terms— HMM-based speech synthesis, Prosody, High- Level Syntactical Analysis

Full Paper

Bibliographic reference.  Obin, Nicolas / Lanchantin, Pierre / Avanzi, Mathieu / Lacheret-Dujour, Anne / Rodet, Xavier (2010): "Toward improved HMM-based speech synthesis using high-level syntactical features", In SP-2010, paper 133.