First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

Acoustic, Perceptual, and Linguistic Analyses of Intonation Contours in Human/Machine Dialogues

Nancy A. Daly, Victor W. Zue

Spoken Language Systems Group, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA

This paper describes our research directed towards the quantification and use of prosodic cues in the intonation contours for different types of queries found in human/machine problem-solving dialogues. We ask three fundamental questions: First, what factors determine intonation encoding for queries? Second, how do these factors interact? Third, what are the implications for speech understanding? Our analysis is based on a corpus of spontaneous speech, containing several thousand sentences, collected in conjunction with the development of the MIT voyager urban exploration and navigation system, under simulated human/machine dialogues. In our corpus, we found that over 90% of the WH-questions, such as Where is MIT, have low final boundary tones. For the YES-NO questions, such as Is there a bank near Harvard, on the other hand, only about 64% were found to have high final boundary tones. Our results, based on classification and regression tree analyses (CART), indicate that, while syntactic structure is the most important factor in predicting intonation contours, other factors such as the sentence's main verb and the speaker's sex are also important. We performed perceptual experiments in which subjects were asked to rate the appropriateness of a simple yes-no answer on a 10-point scale. Our results confirm that listeners vary their judgments of YES-NO appropriateness based on factors other than final boundary tone.

