4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of groups. The main problem in interfacing speech stream segregation with HMM-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman's utterance is, on average, 75%.
Bibliographic reference. Okuno, Hiroshi G. / Nakatani, Tomohiro / Kawabata, Takeshi (1996): "A new speech enhancement: speech stream segregation", In ICSLP-1996, 2356-2359.