Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Cross-Language Evaluation of Voice-to-Phoneme Conversions for Voice-Tag Application in Embedded Platforms

Yan Ming Cheng, Changxue Ma, Lynette Melnar

Motorola Labs, USA

Previously, we proposed two voice-to-phoneme conversion algorithms for speaker-independent voice-tag creation specifically targeted at applications on embedded platforms, an environment sensitive to CPU and memory resource consumption [1]. These two algorithms (batch mode and sequential) were applied in a same-language context, i.e., both acoustic model training and voice-tag creation and application were performed on the same language.

In this paper, we investigate the cross-language application of these two voice-to-phoneme conversion algorithms, where the acoustic models are trained on a particular source language while the voicetags are created and applied on a different target language. Here, both algorithms create phonetic representations of a voice-tag of a target language based on the speaker-independent acoustic models of a distinct source language. Our experiments show that recognition performances of these voice-tags vary depending on the source-target language pair, with the variation reflecting the predicted phonological similarity between the source and target languages. Among the most similar languages, performance nears that of the native-trained models and surpasses the native reference baseline.

Full Paper

Bibliographic reference.  Cheng, Yan Ming / Ma, Changxue / Melnar, Lynette (2006): "Cross-language evaluation of voice-to-phoneme conversions for voice-tag application in embedded platforms", In INTERSPEECH-2006, paper 1062-Mon1BuP.5.