4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
In order to explain the different performances obtained with natural and synthetic speech at different linguistic levels over the telephone line, we analyzed the data collected in an experiment where 108 randomized stimuli were presented to 96 subjects. Subjects were required to identify the consonant in 51 CV and 57 VCV meaningful or meaningless words. There were 20 different listening conditions: 6 TTS systems (3 formant-based (SF) and 3 diphone-based (SD)), a pure natural voice (NV) and 3 signal-to-noise (S/N) ratios (6, 0, and -6 dB) for a total of 10 systems, presented both in good and in telephone condition. The comparison between consonant confusions occurred for natural and synthetic speech with comparable overall levels of intelligibility performance showed that the distributions of the consonant confusions for natural and synthetic speech were often quite different in each condition. Some analyses of different spectrograms suggests that such confusions are due to some problems in the phonetic rules and to the telephone line.
Bibliographic reference. Delogu, Cristina / Paoloni, Andrea / Ragazzini, Susanna / Ridolfi, Paola (1996): "Spectral analysis of synthetic speech and natural speech with noise over the telephone line", In ICSLP-1996, 1409-1412.