4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

A New Speech Synthesis System Based on the ARX Speech Production Model

Weizhong Zhu, Hideki Kasuya

Faculty of Engineering, Utsunomiya University, Utsunomiya, Japan

In this paper, we present a new formant-type speech analysis-synthesis system based on the ARX (Auto-Regressive with Exogenous Input) speech production model. The model consists of cascade formant-antiformant synthesizers driven by a voicing source and an unvoiced turbulent noise source. One of the key features of the proposed method is that we have an algorithm to automatically measure the voicing source, unvoiced source and formant-antiformant parameters of the synthesizer directly from natural speech waveforms. After having automatically obtained estimates of the parameters from natural speech, one can manipulate the estimates using a flexible editing tool that has been developed as a part of the system. By changing values of the fundamental frequency, glottal open quotient, spectral tilt parameter, turbulent noise level, formant-antiformant frequencies and bandwidths, we can synthesize natural sounding speech with various voice qualities including modal, breathy, tense, and whisper voice. Acoustic correlates of these voice qualities could be systematically investigated using the proposed system. Since our analysis-editing-synthesis system has been developed on the MS-Windows platform, it is expected that it will be a useful tool in various basic areas of speech science and technology.

Full Paper

Bibliographic reference.  Zhu, Weizhong / Kasuya, Hideki (1996): "A new speech synthesis system based on the ARX speech production model", In ICSLP-1996, 1413-1416.