14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Stochastic-Deterministic Signal Modelling for the Tracking of Pitch in Noise and Speech Mixtures Using Factorial HMMs

Matthew McCallum, Bernard Guillemin

University of Auckland, New Zealand

Obtaining estimates of the fundamental frequencies associated with either noise or speech in noise/speech mixtures can be important in speech enhancement. Accurate simultaneous estimation of these can result in both an improved subjective quality as well as a higher signal to noise ratio (SNR) of the resulting speech. It is crucial with such an algorithm that each periodic component be reliably identified as either noise or speech. Further, the algorithm needs to be robust to changing SNR of the noisy speech arising from a range of environmental conditions. In this paper a multipitch tracking algorithm is proposed based on a stochastic-deterministic (SD) signal model in the complex short-time Fourier transform (STFT) framework, using a factorial hidden Markov model (FHMM). Unlike previous multipitch tracking algorithms based on FHMMs, the proposed algorithm performs well even when the levels of noise and speech differ significantly from those of the training data. This robustness is attributed in part to the flexible SD model employed. With this model, a priori information of noise and speech used to identify and track non-stationary periodic components is based primarily on their spectral envelope, not their absolute amplitude.

Full Paper

Bibliographic reference.  McCallum, Matthew / Guillemin, Bernard (2013): "Stochastic-deterministic signal modelling for the tracking of pitch in noise and speech mixtures using factorial HMMs", In INTERSPEECH-2013, 3289-3293.