Speech Prosody 2008

Campinas, Brazil
May 6-9, 2008

Holistic and Prosodic Representation of the Segmental Aspect of Speech

Nobuaki Minematsu (1), T. Nishimura (2), D. Saito (1), S. Asakawa (1), Y. Qiao (1)

(1) Graduate School of Engineering; (2) Graduate School of Medicine, The University of Tokyo, Japan

Speech communication has several steps of encoding, transmission, and decoding. In each step, various acoustic distortions are inevitably induced by non-linguistic factors such as differences of age, gender, microphone, line, room, auditory characteristics of a hearerís ears, etc. In spite of this large variability, humans can perform very precise speech processing. Recently, the first author proposed a novel representation of speech [1, 2], which is invariant with these factors at all. Only the dynamic motions in speech are focused on and the static features in speech are completely discarded. The high validity of this new representation for speech recognition was already verified experimentally [3, 4, 5]. In this paper, we show that the new representation of the segmental aspect of speech can be interpreted as a kind of holistic and prosodic feature because the representation captures speech as music, i.e. timbre-based melody.

Full Paper

Bibliographic reference.  Minematsu, Nobuaki / Nishimura, T. / Saito, D. / Asakawa, S. / Qiao, Y. (2008): "Holistic and prosodic representation of the segmental aspect of speech", In SP-2008, 169-172.