Sparse Coding of Pitch Contours with Deep Auto-Encoders

Nicolas Obin, Julie Belião

This paper presents a sparse coding algorithm based on deep auto-encoders for the stylization and the clustering of pitch contours. The main objective of the proposed algorithm is to learn a set of pitch templates that can be easily interpreted by humans and whose combination can approximate efficiently the observed pitch contours. The proposed learning architecture is based on deep auto-encoders, commonly used to learn non-linear and low-dimensional latent representations that approximate the observed data. The proposed deep architecture is based on stacked auto-encoders and the sparsity of the net- work is investigated in order to learn a more robust and general representation of the pitch contours (dropout, denoising auto-encoder, sparsity regularization). The deep auto-encoding of the pitch contours is illustrated and discussed on the TIMIT American-English speech database with comparison of other existing stylization and clustering algorithms.

 DOI: 10.21437/SpeechProsody.2018-161

Cite as: Obin, N., Belião, J. (2018) Sparse Coding of Pitch Contours with Deep Auto-Encoders. Proc. 9th International Conference on Speech Prosody 2018, 799-803, DOI: 10.21437/SpeechProsody.2018-161.

  author={Nicolas Obin and Julie Belião},
  title={Sparse Coding of Pitch Contours with Deep Auto-Encoders},
  booktitle={Proc. 9th International Conference on Speech Prosody 2018},