13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Improving WFST-based G2P Conversion with Alignment Constraints and RNNLM N-best Rescoring

Josef R. Novak (1), Paul R. Dixon (2), Nobuaki Minematsu (1), Keikichi Hirose (1), Chiori Hori (2), Hideki Kashioka (2)

(1) Graduate School of Information Science and Technology, The University of Tokyo, Japan
(2) National Institute of Communication Technology, Kyoto, Japan

This work introduces a modified WFST-based multiple to multiple EM-driven alignment algorithm for Grapheme-to-Phoneme (G2P) conversion, and preliminary experimental results applying a Recurrent Neural Network Language Model (RNNLM) as an N-best rescoring mechanism for G2P conversion. The alignment algorithm leverages the WFST framework and introduces several simple structural constraints which yield a small but consistent improvement in Word Accuracy (WA) on a selection of standard baselines. The RNNLM rescoring further extends these gains and achieves state-of-the-art performance on four standard G2P datasets. The system is also shown to be significantly faster than existing solutions. Finally, the complete WFST-based G2P framework is provided as an open-source toolkit.

Index Terms: G2P, Alignment, RNNLM, WFST

Full Paper

Bibliographic reference.  Novak, Josef R. / Dixon, Paul R. / Minematsu, Nobuaki / Hirose, Keikichi / Hori, Chiori / Kashioka, Hideki (2012): "Improving WFST-based G2p conversion with alignment constraints and RNNLM n-best rescoring", In INTERSPEECH-2012, 2526-2529.