We propose a method for the enhancement of intelligibility in scenarios where speech is rendered in a noisy environment. The method is based on the hypothesis that intelligibility is a monotonic function of the degree of preservation of the speech spectral dynamics. The accuracy of the speech spectral dynamics can then be traded against the power of the rendered speech signal. We can either maximize the dynamics accuracy given the signal power, or minimize the signal power given the dynamics accuracy. In our implementation, the spectral dynamics is quantified as the difference of the mel cepstra between time frames of the speech signal. We compared the speech rendered by our implementation against both natural speech and a reference method, for the scenario where signal power is minimized given a target dynamics accuracy, and observed a significantly improved intelligibility. The low system delay, and the low complexity and memory requirements make the new method particularly suitable for real-time applications.
Bibliographic reference. Petkov, Petko N. / Kleijn, W. Bastiaan (2013): "Preservation of speech spectral dynamics enhances intelligibility", In INTERSPEECH-2013, 3597-3601.