5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

LPC Poles Tracker for Music/Speech/Noise Segmentation and Music Cancellation

Stephane H. Maes

Human Language Technologies Group, Speech Decoding Design Department, IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

In automatic speech recognition (ASR) of broadcast news shows the input utterances are often corrupted by background music and noise. This paper proposes a new method of au- tomatic segmentation a speech signals according to the back- ground: music, clean or noisy. LPC analysis is used to extract the poles of the associated transfer function. Based on the time evolution of the poles it is possible to discriminate the contributions of music, speech and noise: music poles are sta- bler longer than speech poles while noise poles have a more unstable behavior than speech poles. Once the background of a signal is identified, poles tagged as non-speech can be sep- arated from speech poles. Using only the speech poles along with the LPC residuals, it is possible to reconstruct a new signal freed of music and noise contributions.

Full Paper

Bibliographic reference.  Maes, Stephane H. (1997): "LPC poles tracker for music/speech/noise segmentation and music cancellation", In EUROSPEECH-1997, 1131-1134.