Lipreading using deep bottleneck features for optical and depth images

Satoshi Tamura, Koichi Miyazaki, Satoru Hayamizu


This paper investigates a lipreading scheme employing optical and depth modalities, with using deep bottleneck features. Optical and depth data are captured by Microsoft Kinect v2, followed by computing an appearance-based feature set in each modality. A basic feature set is then converted into a deep bottleneck feature using a deep neural network having a bottleneck layer. Multi-stream hidden Marcov models are used for recognition. We evaluated the method using our connected-digit corpus, comparing to our previous method. It is finally found that we could improve lipreading performance by employing deep bottleneck features.


Cite as: Tamura, S., Miyazaki, K., Hayamizu, S. (2017) Lipreading using deep bottleneck features for optical and depth images. Proc. The 14th International Conference on Auditory-Visual Speech Processing, 76-77.


@inproceedings{Tamura2017,
  author={Satoshi Tamura and Koichi Miyazaki and Satoru Hayamizu},
  title={ Lipreading using deep bottleneck features for optical and depth images},
  year=2017,
  booktitle={Proc. The 14th International Conference on Auditory-Visual Speech Processing},
  pages={76--77}
}