EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Joint Speech and Audio Coding Combining Sinusoidal Modeling and Wavelet Packets

Márk Fék (1), Annamária R. Várkonyi-Kóczy (1), Jean-Marc Boucher (2)

(1) Budapest University of Technology and Economics, Hungary
(2) ENST de Bretagne, France

This paper presents a joint speech and audio coding algorithm combining sinusoidal modeling and a perceptually adapted Wavelet Packet Transform (WPT). The input signal is limited to the band of 50-7000 Hz, and sampled at 16 kHz. The sinusoidal modeling uses a Sinusoidal Similarity Measure (SSM) to find stable sinusoidal components. A novel pitch harmonics based encoding is applied to encode the sinusoidal frequencies. The residual is obtained by extracting the re-synthesized sinusoids from the input, and is processed by a WPT simulating the critical bands of the Human Auditory System. Perceptual Noise Substitution (PNS) is applied in noisy WPT sub-bands to reduce the bit rate. The method provides nearly transparent quality for both speech and audio inputs. The mean bit rate of the compressed signal varies between 32-62 kbps depending on the input.

