4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
This report proposes a simple and practical model for generating relatively monotonous, but sufficiently natural, prosodic features by analyzing restricted natural speech. The basic assumption of this model is that the natural F0 pattern can be obtained without complicated linguistic analysis. To achieve this prosodic control, we have analyzed and modeled this speech subject that is recoded so that it will appear in the following. First we composed the hypothesis that a Japanese Major Phrase (MP) could be modeled with the combination of a minor phrase (mp) sequence limited to fewer than three. The number of the combination is decided by the accentual type of minor phrase and intrasentence position. The combination types have 28 patterns. To confirm the hypothesis, the restricted speech (RSP) subjects were collected and analyzed by having the speaker utter the subject sentence without emotional effect or attention to prosodic features. Furthermore, to evaluate the performance of the model, a pattern-matching process (two-level DP) was used between the synthesized F0 pattern and the restricted real F0 pattern. We thus confirmed that our model would create a synthesized F0 pattern sufficiently similar the restricted-speech patterns. The synthesized speech using this model sounds relatively monotonous, but is sufficiently natural as compared with general spontaneous speech.
Bibliographic reference. Hamagami, Tomoki / Magata, Ken-ichi / Komura, Mitsuo (1996): "A study on Japanese prosodic pattern and its modeling in restricted speech", In ICSLP-1996, 1628-1631.