Speech Prosody 2008
This paper examines the robustness of tonal and segmental cues in noise exemplified by Mandarin monosyllables. We investigate how varying levels of noise inhibit the recognition of syllabic tone, onset, vowel nucleus and coda, and which property of the syllable is the most stable in audio only and audio plus video conditions.
A corpus of 220 frequent syllables was uttered by a male speaker of Mandarin and video-taped. Multi-talker babble noise was added to the resulting speech recordings at SNRs of 0, -3, -6 and -9 dB. In a perception test subjects were asked to write down the Pinyin plus tone combination of the word they perceived. Results indicate, inter alia, that the tonal information is more robust in noise than the segmental one, and that the nuclear vowel is the most stable part of the syllable. Auditory-visual gain was observed for the segments, but different from results of an earlier study, not for the tones. Tonal recognition rates were also influenced by the type of the nuclear vowel, with front vowels yielding highest and back vowels yielding lowest rates.
Bibliographic reference. Mixdorff, Hansjörg / Wang, Yuping / Hu, Yu (2008): "Robustness of tonal and segmental information in noise - auditory and visual contributions", In SP-2008, 261-264.