13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Training Deep Nets with Imbalanced and Unlabeled Data

Jeff Berry (1,3), Ian Fasel (2), Luciano Fadiga (3,4), Diana Archangeli (1)

(1) Department of Linguistics, University of Arizona, Tucson, AZ, USA
(2) School of Information Science, Technology, and Arts, University of Arizona, Tucson, AZ, USA
(3) Robotics, Brain and Cognitive Sciences, Istituto Italiano di Tecnologia, Genova, Italy
(4) Section of Human Physiology, University of Ferrara, Ferrara, Italy

Training deep belief networks (DBNs) is normally done with large data sets. Our goal is to predict traces of the surface of the tongue in ultrasound images of hu- man speech. Hand-tracing is labor-intensive; the dataset is highly imbalanced since many images are extremely similar. We propose a bootstrapping method which han- dles this imbalance by iteratively selecting a small subset of images to be handtraced (thereby reducing human la- bor time), then (re)training the DBN, making use of an entropy-based diversity measure for the initial selection, thereby achieving over a two-fold reduction in human time required for tracing with human-level accuracy.

Index Terms: deep belief networks, ultrasound imaging, tongue imaging, speech processing, bootstrapping, class imbalance problem

Full Paper

Bibliographic reference.  Berry, Jeff / Fasel, Ian / Fadiga, Luciano / Archangeli, Diana (2012): "Training deep nets with imbalanced and unlabeled data", In INTERSPEECH-2012, 1756-1759.