Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

A Neural-Network-Based Model of Segmental Duration for Speech Synthesis

Marcel Riedi

Speech Processing Group, Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH), Zurich, Switzerland

This paper presents a neural-network-based model of segmental duration. It was developed with the intention of applying it to speech synthesis for German. Given a set of factors influencing the duration of a phone-sized segment a neural network is used to predict the segment duration. Different mappings of these factors to values suitable for networks with binary and analog input nodes have been applied. So far? the highest correlation coefficient between the observed and predicted segment durations of a test set is 0.886. Informal acoustical tests with this model in combination with a speech synthesis system further demonstrated the feasibility of this approach.

Bibliographic reference.  Riedi, Marcel (1995): "A neural-network-based model of segmental duration for speech synthesis", In EUROSPEECH-1995, 599-602.