FAAVSP - The 1st Joint Conference on Facial Analysis, Animation, and
Auditory-Visual Speech Processing

Vienna, Austria
September 11-13, 2015

Stream Weight Estimation using Higher Order Statistics in Multi-modal Speech Recognition

Kazuto Ukai, Satoshi Tamura, Satoru Hayamizu

Department of Information Science, Gifu University, Gifu, Japan

In this paper, stream weight optimization for multi-modal speech recognition using audio information and visual infor- mation is examined. In a conventional multi-stream Hidden Markov Model (HMM) used in multi-modal speech recogni- tion, a constraint in which the summation of audio and visual weight factors should be one is employed. This means bal- ance between transition and observation probabilities of HMM is fixed. We study an effective weight estimation indicator when releasing the constraint. Recognition experiments were conducted using an audio-visual corpus CENSREC-1-AV [1]. In noisy environments, effectiveness of deactivating the con- straint is clarified for improving recognition accuracy. Sub- sequently higher-order statistical parameter (kurtosis) based stream weights were proposed and tested. Through recognition experiments, it is found proposed stream weights are successful. Index Terms: stream weight optimization, multi-modal speech recognition, kurtosis, multi-stream HMM.

Full Paper

Bibliographic reference.  Ukai, Kazuto / Tamura, Satoshi / Hayamizu, Satoru (2015): "Stream weight estimation using higher order statistics in multi-modal speech recognition", In FAAVSP-2015, 181-184.