First International Conference on Spoken Language Processing (ICSLP 90)
In this paper, we describe a neural network based concatenation method of synthesis units for synthesis by rule. In proposed method, two types of multilayer perceptions are used. One is neural network for phoneme recognition, another is for production of spectrum. A recognition network performs mapping a spectrum to a vector of which elements show similarities to each phoneme ( phonetic vector ), and a spectral production network performs inverse transformation of a recognition network. At boundary of synthesis units, two phonetic vectors are calculated using a recognition network, and interpolation between these vectors are performed, then the spectra of interpolation segment are generated by a spectral production network. We provide multiple sets of neural networks for vowels and consonants, and these are trained based on back-propagation algorithm. Using the proposed method, we obtained satisfactory results in realizing coarticulation, and synthetic speech is very natural.
Bibliographic reference. Ishikawa, Yasushi / Nakajima, Kunio (1990): "Neural network based concatenation method of synthesis units for synthesis by rule", In ICSLP-1990, 793-796.