Second European Conference on Speech Communication and Technology

Genova, Italy
September 24-26, 1991


A Physical Approach to Speech Quality Assessment: Correlation Patterns in the Speech Spectrogram

Tammo Houtgast, Jan A. Verhave

TNO Institute for Perception, Soesterberg, The Netherlands

A bank of filters has been implemented digitally to obtain, with running speech as input, energy values within well defined, Gaussian-shaped, frequency-time windows. The analysis concentrates on the correlation between the dB-outputs of pairs of different windows, with the frequency-spacing and/or the time-spacing between two such windows as parameters. The resulting correlation patterns reflect, in a global way, the statistics of the dynamic characteristics of running speech in both the frequency and the time domain. Various aspects of such correlation patterns will be considered briefly, illustrating interesting relations with some basic features in hearing and speech intelligibility. The main issue concerns the possible usefulness of this global measure for speech quality assessment. It is found that these correlation patters derived from natural speech have a typical structure, providing a basis for judging the degree of "naturalness" of a token of synthetic speech.

