4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

A New Speech Enhancement: Speech Stream Segregation

Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata

NTT Basic Research Laboratories, Kanagawa, Japan

Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of groups. The main problem in interfacing speech stream segregation with HMM-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman's utterance is, on average, 75%.

Full Paper

Bibliographic reference.  Okuno, Hiroshi G. / Nakatani, Tomohiro / Kawabata, Takeshi (1996): "A new speech enhancement: speech stream segregation", In ICSLP-1996, 2356-2359.