First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

The Role of Temporal Structure of Speech in Word Perception and Spoken Language Understanding

Yoshinori Kitahara (1), Yoh'ichi Tohkura (2)

(1) Central Research Laboratory, HITACHI,Ltd., Kokubunji-shi, Japan
(2) ATR Auditory and Visual Perception Research Laboratories, Kyoto, Japan

Speech has two aspects: one is phonetic feature and the other is prosodic feature. The role of prosody in the cognitive process of spoken language has been studied. Prosody consists of three kinds of prosodic features, namely pitch, amplitude and temporal structures. Among them, this study focuses on temporal structure, including pausing and phoneme duration. Perceptual experiments based on word detection tests have been performed. The stimuli were excitation source signals composed of pulse trains and white noise that included no spectral information. The experimental results suggest that a continuous rhythm of spoken language, which consists of not only pausing information but also phonemic duration, plays an important role in word detection. Still more, pausing information, including the silence just before the burst in a word-initial plosive, contributes to phrase or word segmentation. On spoken language understanding, these temporal cues assist in the word recognition, combined with the context information in various levels of speech perception.

Full Paper

Bibliographic reference.  Kitahara, Yoshinori / Tohkura, Yoh'ichi (1990): "The role of temporal structure of speech in word perception and spoken language understanding", In ICSLP-1990, 389-392.