In this paper, we describe a neural network based spectral interpolation method for speech synthesis by rule. In this method, two types of artificial neural networks are used. One is neural network for phoneme recognition, and another is for spectral synthesis. A recognition network performs mapping a spectrum onto a vector of which elements represent similarities to each phoneme ( phonemic vector ). A spectral synthesis network performs inverse transformation of a recognition network, maps phonemic vector onto a spectrum. At boundary of synthesis units, two phonemic vectors are obtained by a recognition network, and interpolation between these vectors are performed, then the spectra of interpolated segment are generated by a spectral synthesis network. We compare the spectral feature of our method and linear interpolation. Spectral distortion by proposed method is greater than linear interpolation, but very natural formant transition is obtained by our method. And synthetic speech is natural. Also, we show that the proposed method is able to generate various types of coarticulation.
Bibliographic reference. Ishikawa, Yasushi / Nakajima, Kunio (1991): "Neural network based spectral interpolation method for speech synthesis by rule", In EUROSPEECH-1991, 47-50.