14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Sequence-Discriminative Training of Deep Neural Networks

Karel Veselý (1), Arnab Ghoshal (2), Lukáš Burget (1), Daniel Povey (3)

(1) Brno University of Technology, Czech Republic
(2) University of Edinburgh, UK
(3) Johns Hopkins University, USA

Sequence-discriminative training of deep neural networks (DNNs) is investigated on a 300 hour American English conversational telephone speech task. Different sequence-discriminative criteria . maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI . are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequence-based criteria . lattices are regenerated after the first iteration of training; and, for MMI and BMMI, the frames where the numerator and denominator hypotheses are disjoint are removed from the gradient computation. Starting from a competitive DNN baseline trained using cross-entropy, different sequence-discriminative criteria are shown to lower word error rates by 8.9% relative, on average. Little difference is noticed between the different sequence-based criteria that are investigated. The experiments are done using the open-source Kaldi toolkit, which makes it possible for the wider community to reproduce these results.

Full Paper

Bibliographic reference.  Veselý, Karel / Ghoshal, Arnab / Burget, Lukáš / Povey, Daniel (2013): "Sequence-discriminative training of deep neural networks", In INTERSPEECH-2013, 2345-2349.