INTERSPEECH 2006 - ICSLP
Pronunciation lexicons often contain pronunciation variants. This can create two problems: It can be difficult to define these variants in an internally consistent way and it can also be difficult to extract generalised grapheme-to-phoneme rule sets from a lexicon containing variants. In this paper we address both these issues by creating ‘pseudo-phonemes’ associated with sets of ‘generation restriction rules’ to model those pronunciations that are consistently realised as two or more variants. By pre-processing and post-processing the lexicon appropriately, grapheme-to-phoneme algorithms that were not able to deal with pronunciation variants previously can now be extended to incorporate variants easily, without requiring changes to the standard algorithms. We evaluate the effectiveness of this approach using the Default&Refine rule extraction algorithm, and apply the method to both the English Oxford Advanced Learners Dictionary (OALD) and the Flemish FONILEX pronunciation lexicon. We find that the approach generalises to different languages, is able to model phonemic variation accurately and is able to identify inconsistent variants in pre-existing lexicons.
Bibliographic reference. Davel, Marelie / Barnard, Etienne (2006): "Developing consistent pronunciation models for phonemic variants", In INTERSPEECH-2006, paper 1760-Tue3A3O.4.