Unsupervised Phoneme Segmentation of Previously Unseen Languages

Marco Vetter, Markus Müller, Fatima Hamlaoui, Graham Neubig, Satoshi Nakamura, Sebastian Stüker, Alex Waibel

In this paper we investigate the automatic detection of phoneme boundaries in audio recordings of an unknown language. This work is motivated by the needs of the project BULB which aims to support linguists in documenting unwritten languages. The automatic phonemic transcription of recordings of the unwritten language is part of this. We cannot use multilingual phoneme recognizers as their phoneme inventory might not completely cover that of the new language. Thus we opted for pursuing a two step approach which is inspired by work from speech synthesis for previously unknown languages. First, we detect boundaries for phonemes, and then we classify the detected segments into phoneme units. In this paper we address the first step, i.e. the detection of the phoneme boundaries. For this we again used multilingual and crosslingual phoneme recognizers but were only interested in the phoneme boundaries detected by them and not the phoneme identities. We measured the quality of the segmentations obtained this way using precision, recall and F-measure. We compared the performance of different configurations of mono- and multilingual phoneme recognizers among each other and against a monolingual gold standard. Finally we applied the technique to Basaa, a Bantu language.

DOI: 10.21437/Interspeech.2016-1440

Cite as

Vetter, M., Müller, M., Hamlaoui, F., Neubig, G., Nakamura, S., Stüker, S., Waibel, A. (2016) Unsupervised Phoneme Segmentation of Previously Unseen Languages. Proc. Interspeech 2016, 3544-3548.

author={Marco Vetter and Markus Müller and Fatima Hamlaoui and Graham Neubig and Satoshi Nakamura and Sebastian Stüker and Alex Waibel},
title={Unsupervised Phoneme Segmentation of Previously Unseen Languages},
booktitle={Interspeech 2016},