Unsupervised Word Segmentation from Speech with Attention

Pierre Godard, Marcely Zanon Boito, Lucas Ondel, Alexandre Berard, Fran├žois Yvon, Aline Villavicencio, Laurent Besacier

We present a first attempt to perform attentional word segmentation from speech signal, with the final goal of automatically identifying lexical units in a low-resource, unwritten language (UL). Our methodology assumes a pairing between recordings in the UL with translations in a well-resourced language. It uses Acoustic Unit Discovery (AUD) to convert speech into a pseudo-phones sequence that is segmented using neural soft alignments (from a neural machine translation model). Evaluation uses an actual Bantu UL, Mboshi; comparisons to monolingual and bilingual baselines illustrate the potential of attentional word segmentation for language documentation.

 DOI: 10.21437/Interspeech.2018-1308

Cite as: Godard, P., Boito, M.Z., Ondel, L., Berard, A., Yvon, F., Villavicencio, A., Besacier, L. (2018) Unsupervised Word Segmentation from Speech with Attention. Proc. Interspeech 2018, 2678-2682, DOI: 10.21437/Interspeech.2018-1308.

  author={Pierre Godard and Marcely Zanon Boito and Lucas Ondel and Alexandre Berard and Fran├žois Yvon and Aline Villavicencio and Laurent Besacier},
  title={Unsupervised Word Segmentation from Speech with Attention},
  booktitle={Proc. Interspeech 2018},