4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
This paper presents an effort to explore the utility of prosodic information in language identification/ discrimination (LED) tasks. We present our model and results from pair-wise LID lasks with English, Spanish, Japanese and Mandarin using multi-speaker elicited spontaneous speech and a selected set of prosodic parameters. These languages represent four different types of languages, varying in pitch use and timing. Parameters were designed to capture pilch and amplitude contours on a syllable-by- syllable basis, and to be insensitive to overall amplitude, pitch, and speaking rate. Results show that prosodic cues alone can distinguish between some language pairs with results comparable to many non-prosodic systems, indicating that prosodic parameters are highly useful in automatic LTD. However, the statistical relationships between, a number of individual features deduced from timing and pitch measurements are needed to begin to capture such complex perceptual events as rhythm. Strengths of individual prosodic parameters and classes of parameters -primarily pilch, secondarily duration and location - reflect differences between the four languages mostly as expectated based on the linguistic literature, suggesting that effective use of prosodic parameters is aided by an understanding of the relationships between physical measurements and perceived linguistic events.
Bibliographic reference. Thymé-Gobbel, Ann E. / Hutchins, Sandra E. (1996): "On using prosodic cues in automatic language identification", In ICSLP-1996, 1768-1771.