5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Inference Of Missing Spectrographic Features For Robust Speech Recognition

Bhiksha Raj, Rita Singh, Richard M. Stern

Carnegie Mellon University, USA

Two types of algorithms are introduced that recover missing time-frequency regions of log-spectral representations of speech. These compensation algorithms modify the incoming feature vector without any changes to the speech recognition system, in contrast to previously-described approaches. The first approach clusters the log-spectral vectors representing clean speech. Missing data are recovered by estimating the spectral cluster in each analysis frame on the basis of the feature values that are present. The second approach uses MAP procedures to estimate the values of missing data elements based on their correlation with the features that are present. Greatest recognition accuracy was obtained using the correlation-based approach, presumably because of its ability to exploit the temporal as well as spectral structure of speech. The recognition accuracy provided by these algorithms approaches but does not exceed that obtained by traditional marginalization. Nevertheless, it is believed that these algorithms provide greater computational efficiency and enable greater flexibility in recognition system structure.

Full Paper

Bibliographic reference.  Raj, Bhiksha / Singh, Rita / Stern, Richard M. (1998): "Inference of missing spectrographic features for robust speech recognition", In ICSLP-1998, paper 1152.