ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

A study on LVCSR and keyword search for tagalog

Korbinian Riedhammer, Van Hai Do, James Hieronymus

We describe a state-of-the-art large vocabulary continuous speech recognition (LVCSR) and keyword search (KWS) system trained on roughly 70 hours of conversational telephone speech. Using the Kaldi speech recognition toolkit, we investigate several aspects: for the acoustic front-end, we analyze the use of mel-frequency cepstral coefficients (MFCC), pitch and probability-of-voicing (PoV), and deep neural network (DNN) bottleneck (BN) features, as well as their feature-level combination ("tandem"). For the acousticphonetic decision tree, we explore different hidden Markov model (HMM) topologies for the glottalization phoneme /?/ to model its typically short duration. For the acoustic model, we compare regular continuous HMM with a sort of multi-codebook subspace Gaussian mixture model (SGMM) that lead to an overall best word error rate (WER) of 58.7% and 56.3%, respectively. The KWS is implemented as a word lattice search, and is augmented by a syllable lattice back-up search to capture out-of-vocabulary keywords as well as misrecognized lexical surface forms due to ambiguous prefix and hyphenation rules.

doi: 10.21437/Interspeech.2013-570

Cite as: Riedhammer, K., Do, V.H., Hieronymus, J. (2013) A study on LVCSR and keyword search for tagalog. Proc. Interspeech 2013, 2529-2533, doi: 10.21437/Interspeech.2013-570

  author={Korbinian Riedhammer and Van Hai Do and James Hieronymus},
  title={{A study on LVCSR and keyword search for tagalog}},
  booktitle={Proc. Interspeech 2013},