5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Semi-Automatic Phonetic Labelling of Large Corpora

O. Mella, D. Fohr

CRIN-CNRS & INRIA Lorraine Batiment LORIA, Vandoeuvre-les-Nancy, France

The aim of the present paper is to present a methodology to semi-automatically label large corpora. This methodology is based on three main points: using several concurrent automatic stochastic labellers, decomposing the labelling of the whole corpus into an iterative refining process and building a labelling comparison procedure which takes into account phonologic and acoustic-phonetic rules to evaluate the similarity of the various labelling of one sentence. After having detailed these three points, we describe our HMM-based labelling tool and we describe the application of that methodology to the Swiss French POLYPHON database.

Full Paper

Bibliographic reference.  Mella, O. / Fohr, D. (1997): "Semi-automatic phonetic labelling of large corpora", In EUROSPEECH-1997, 1731-1734.