12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Speech Events are Recoverable from Unlabeled Articulatory Data: Using an Unsupervised Clustering Approach on Data Obtained from Electromagnetic Midsaggital Articulography (EMA)

Daniel Duran, Jagoda Bruni, Grzegorz Dogil, Hinrich Schütze

Universität Stuttgart, Germany

Some models of speech perception/production and language acquisition make use of a quasi-continuous representation of the acoustic speech signal. We investigate whether such models could potentially profit from incorporating articulatory information in an analogous fashion. In particular, we investigate how articulatory information represented by EMA measurements can influence unsupervised phonetic speech categorization. By incorporation of the acoustic signal and non-synthetic, raw articulatory data, we present first results of a clustering procedure, which is similarly applied in numerous language acquisition and speech perception models. It is observed that non-labeled articulatory data, i.e. without previously assumed landmarks, perform fine clustering results. A more effective clustering outcome for plosives than for vowels seems to support the motor view of speech perception.

Full Paper

Bibliographic reference.  Duran, Daniel / Bruni, Jagoda / Dogil, Grzegorz / Schütze, Hinrich (2011): "Speech events are recoverable from unlabeled articulatory data: using an unsupervised clustering approach on data obtained from electromagnetic midsaggital articulography (EMA)", In INTERSPEECH-2011, 2201-2204.