Sixth ISCA Workshop on Speech Synthesis

Bonn, Germany
August 22-24, 2007

Using Articulatory Position Data in Voice Transformation

Arthur R. Toth, Alan W. Black

Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA

Articulatory position data is information about the location of various articulators in the vocal tract. One form of it has been made freely available in the MOCHA database [1]. This data is interesting in that it provides direct information on the production of speech, but there is the question of whether it actually provides information beyond what can be derived from the audio signal, which is much easier to collect. Although there has been some success in improving small-scale speech recognition and in demonstrating mappings between articulatory positions and spectral features of the audio signal, there are many problems to which this data has not been applied. This work investigates the possibility of using articulatory position data to improve voice transformation, which is the process of making speech from one person sound as if it had been spoken by another. After further investigation, it appears to be difficult to use articulatory position data to improve voice transformation using state-of-the-art voice transformation techniques as we only had a few positive results across a range of experiments. To achieve these results, it was necessary to modify our baseline voice transformation approach and/or consider features derived from the articulatory positions.

Reference

  1. Wrench, A. (1999), "The MOCHA-TIMIT articulatory database," Queen Margaret University College, Edinburgh, http://www.cstr.ed.ac.uk/artic/mocha.html

Full Paper

Bibliographic reference.  Toth, Arthur R. / Black, Alan W. (2007): "Using articulatory position data in voice transformation", In SSW6-2007, 182-187.