13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Comparison of Grapheme-to-Phoneme Methods on Large Pronunciation Dictionaries and LVCSR Tasks

Stefan Hahn (1), Paul Vozila (2), Maximilian Bisani (2)

(1) Human Language Technology and Pattern Recognition, Computer Science Department, RWTH Aachen University, Aachen, Germany
(2) Nuance Communications, Burlington, MA, USA

Grapheme-to-Phoneme conversion (G2P) is usually used within every state-of-the-art ASR system to generalize beyond a fixed set of words. Although the performance is typically already quite good (<10% phoneme error rate) and pronunciations of important words are checked by a linguist, further improvements are still desirable, especially for end user customization.
   In this work, we present and compare five methods/tools to tackle the G2P task. Although most of the methods have already been published and/or are available as open source software, the reported experiments are done on large state-of-the-art tasks and the used software is from the actual publications.
   Besides an experimental comparison on text data for a range of languages (i.e. measuring the G2P accuracy only), our focus in this paper is measuring the effect of improved G2P modeling on LVCSR performance for a challenging ASR task. Additionally, the effect of using n-Best pronunciation variants instead of single best is investigated briefly.

Index Terms: grapheme-to-phoneme conversion, G2P, ASR

Full Paper

Bibliographic reference.  Hahn, Stefan / Vozila, Paul / Bisani, Maximilian (2012): "Comparison of grapheme-to-phoneme methods on large pronunciation dictionaries and LVCSR tasks", In INTERSPEECH-2012, 2538-2541.