13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Combining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation

Ramya Rasipuram (1,2), Mathew M. Doss (1)

(1) Idiap Research Institute, Martigny, Switzerland
(2) Ecole Polytechnique Fédérale, Lausanne (EPFL), Switzerland

In a recent work, we proposed an acoustic data-driven grapheme-to-phoneme (G2P) conversion approach, where the probabilistic relationship between graphemes and phonemes learned through acoustic data is used along with the orthographic transcription of words to infer the phoneme sequence. In this paper, we extend our studies to under-resourced lexicon development problem. More precisely, given a small amount of transcribed speech data consisting of few words along with its pronunciation lexicon, the goal is to build a pronunciation lexicon for unseen words. In this framework, we compare our G2P approach with standard letter-to-sound (L2S) rule based conversion approach. We evaluated the generated lexicons on PhoneBook 600 words task in terms of pronunciation errors and ASR performance. The G2P approach yields a best ASR performance of 14.0% word error rate (WER), while L2S approach yields a best ASR performance of 13.7% WER. A combination of G2P approach and L2S approach yields a best ASR performance of 9.3% WER.

Index Terms: Kullback-Leibler divergence based HMM, Lex- icon, grapheme, phoneme, grapheme-to-phoneme converter, letter-to-sound rules, multilayer perceptron.

Full Paper

Bibliographic reference.  Rasipuram, Ramya / Doss, Mathew M. (2012): "Combining acoustic data driven G2p and letter-to-sound rules for under resource lexicon generation", In INTERSPEECH-2012, 1820-1823.