First European Conference on Speech Communication and Technology

Paris, France
September 27-29, 1989

On Automatic Extraction of Prosodic Information for Automatic Speech Recognition System

Jacqueline Vaissière

Departement Services Multimedia et Dialogues, Centre National d'Etudes des Telecommunications (CNET), Lannion, France

This paper is concerned with three types of causes leading to errors in a system using strictly speaker-independent rules for automatic extraction of linguistic information from measured prosodic parameters (PP) in read isolated sentences, in French: erroneous measurements of PP, duration and fun- damental frequency (type-1 errors); differences between speakers who do not fit into the same prosodic moult (type-2 problems) and certain combination of segmental influences on duration, which cannot be factored out in a strictly bottom-up system (type-3 problems). It suggests that neither further tuning of the existing rules, nor statistical learning are complete solutions. Type-1 errors are extrinsic to the prosodic module and can be hardly improved. An effective way of reducing incertainties due to type-2 problems is a partial tuning of the set of rules to the particular habits of the speaker: adaptation is feasible because there is a remarkable intra-speaker consistency in prosodic patterning, at least in serially read isolated sentences. Type-3 errors leads to multiple solutions in certain cases. It is necessary therefore to model to a certain extent the inter-speaker variability.

Full Paper

Bibliographic reference.  Vaissière, Jacqueline (1989): "On automatic extraction of prosodic information for automatic speech recognition system", In EUROSPEECH-1989, 1202-1205.