EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Multi-Keyword Spotting of Telephone Speech Using Orthogonal Transform-Based SBR and RNN Prosodic Model

Wern-Jun Wang (1,2), Chun-Jen Lee (2), Eng-Fong Huang (2), Sin-Horng Chen (1)

(1) National Chiao Tung University, Taiwan, ROC
(2) Chunghwa Telecommunication Laboratories, Taiwan, R.O.C.

In this paper, orthogonal transform-based signal bias removal (OTSBR) approach and RNN prosodic model are proposed for multi-keyword spotting of telephone speech. OTSBR is employed in the pre-processing stage of acoustic decoding and aimed at channel bias estimation to eliminate the acoustic mismatch between training and testing environments. The RNN prosodic model is adopted in the post-processing stage of the acoustic decoding to detect word boundaries for reordering the keyword candidates from the keyword spotter. Simulations on the real speech database collected from the Phone Directory Assistant Service developed in Chunghwa Telecommunication Laboratories (CTL-PDAS) were performed to evaluate the proposed methods. Experimental results showed that 71.0% of keyword detection rate and 81.8% of top 5 keywords inclusion rate can be attained by incorporating OTSBR and RNN prosodic model into the system.

Full Paper

Bibliographic reference.  Wang, Wern-Jun / Lee, Chun-Jen / Huang, Eng-Fong / Chen, Sin-Horng (2001): "Multi-keyword spotting of telephone speech using orthogonal transform-based SBR and RNN prosodic model", In EUROSPEECH-2001, 2773-2776.