In this paper we investigate a new statistical excitation mapping technique to enhance throat-microphone speech using joint analysis of throat- and acoustic-microphone recordings. In a recent study we employed source-filter decomposition to enhance spectral envelope of the throat-microphone recordings. In the source-filter decomposition framework we observed that the spectral envelope difference of the excitation signals of throat- and acoustic-microphone recordings is an important source of the degradation in the throatmicrophone voice quality. In this study we model spectral envelope difference of the excitation signals as a spectral tilt vector, and we propose a new phone-dependent GMM-based spectral tilt mapping scheme to enhance throat excitation signal. Experiments are performed to evaluate the proposed excitation mapping scheme in comparison with the state-of-the-art throat-microphone speech enhancement techniques using both objective and subjective evaluations. Objective evaluations are performed with the wideband perceptual evaluation of speech quality (ITU-PESQ) metric. Subjective evaluations are performed with the A/B pair comparison listening test. Both objective and subjective evaluations yield that the proposed statistical excitation mapping consistently delivers higher improvements than the statistical mapping of the spectral envelope to enhance the throat-microphone recordings.
Bibliographic reference. Turan, M. A. Tuğtekin / Erzin, Engin (2013): "A new statistical excitation mapping for enhancement of throat microphone recordings", In INTERSPEECH-2013, 3244-3248.