EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Extracting Caller Information from Voicemail

Geoffrey Zweig, Jing Huang, Mukund Padmanabhan

IBM T.J. Watson Research Center, USA

In this paper we address the problem of extracting the identities and phone numbers of the callers in voicemail messages. Previous work in information extraction from speech includes spoken document retrieval and named entity detection. This task differs from the named entity task in that the information we are interested in is a subset of the named entities in the message, and consequently, the need to pick the correct subset makes the problem more difficult. Also, the caller's identity may include information that is not typically associated with a named entity. In this work, we present two information extraction methods, one based on hand-crafted rules, and one based on a maximum entropy model. We find that both systems give good performance when applied to manually-derived transcriptions, and that the maximum entropy system can reliably identify the time intervals containing phone numbers, even in the presence of significant decoding errors.

Full Paper

Bibliographic reference.  Zweig, Geoffrey / Huang, Jing / Padmanabhan, Mukund (2001): "Extracting caller information from voicemail", In EUROSPEECH-2001, 291-294.