Second International Conference on Spoken Language Processing (ICSLP'92)
Banff, Alberta, Canada
A duration model and a pitch pattern generation model are proposed, in which speech tempo influences on the duration and pitch are considered. Analysis results for the speech tempo influences are described. In the analysis, it was discovered that of all the factors affecting speech rates for individual phrases, at fast tempo, the phrase position within the sentence is the most influential factor, while, at normal and slow tempos, the most important factor is whether or not a pause exists after the phrase. The analysis also revealed that, while pitch frequency values may differ at different tempos, their normalized pitch patterns for a given sentence are quite similar. On the basis of these results, a duration model has been constructed, which determines a suitable tempo for a given sentence or paragraph, and estimates durations for the individual phrases that constitute the sentence or paragraph. The durations for the phonemes within the phrases are estimated according to the phoneme environment. A pitch pattern generation model has been also constructed, which determines the normalized pitch pattern that is little affected by changes in speech tempo. The model then calculates the pitch frequencies which would actually be produced at various tempos. These models have speech tempo parameters, and can generate adequate durations and pitch contours according to the tempo.
Bibliographic reference. Iwata, Kazuhiko / Mitome, Yukio (1992): "Prosody generation models constructed by considering speech tempo influence on prosody", In ICSLP-1992, 1155-1158.