14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Machine Learning of Probabilistic Phonological Pronunciation Rules from the Italian CLIPS Corpus

Florian Schiel (1), Mary Stevens (1), Uwe D. Reichel (1), Francesco Cutugno (2)

(1) Bavarian Archive for Speech Signals, Germany
(2) Università di Napoli Federico II, Italy

A blending of phonological concepts and technical analysis is proposed to yield a better modeling and understanding of phonological processes. Based on the manual segmentation and labeling of the Italian CLIPS corpus we automatically derive a probabilistic set of phonological pronunciation rules: a new alignment technique is used to map the phonological form of spontaneous sentences onto the phonetic surface form. A machine-learning algorithm then calculates a set of phonological replacement rules together with their conditional probabilities. A critical analysis of the resulting probabilistic rule set is presented and discussed with regard to regional Italian accents. The rule set presented here is also applied in the newly published web-serviceWebMAUS that allows a user to segment and phonetically label Italian speech via a simple web-interface.

Full Paper

Bibliographic reference.  Schiel, Florian / Stevens, Mary / Reichel, Uwe D. / Cutugno, Francesco (2013): "Machine learning of probabilistic phonological pronunciation rules from the Italian CLIPS corpus", In INTERSPEECH-2013, 1414-1418.