In this paper, we use prosodic information to improve the accuracy of our template-based automatic speech recognizer. Prosodic information is harvested adopting a data-driven approach. A number of prosodic features is extracted, then combined into major groups, and finally studied separately and together. All acoustic evidence, both segmental and suprasegmental, is modelled non-parametrically. The different sources of information are conveniently combined with segmental conditional random fields. Prosody enhances the accuracy of the state-of-the-art baseline by reducing the word error rate by 7% relative on the nov92, 20k trigram, Wall Street Journal task.
Bibliographic reference. Seppi, Dino / Demuynck, Kris / Compernolle, Dirk Van (2011): "Template-based automatic speech recognition meets prosody", In INTERSPEECH-2011, 545-548.