Fourth European Conference on Speech Communication and Technology

Madrid, Spain
September 18-21, 1995

Top-Down Speech Detection and N-Best Meaning Search in a Voice Activated Telephone Extension System

Kazuya Takeda, Shingo Kuroiwa, Masaki Naito, Seiichi Yamamoto

KDD R&D Laboratories, Kamifukuoka Saitama, Japan

In this paper, a robust speech detection method and an effective N-best search method are proposed. In the proposed speech endpoint detection method, the robustness to varying speech level is improved by using the likelihood of partially matched word sequences in contrast with short time speech level used in conventional methods. As a result, degradation of recognition accuracy due to failure of endpoint detection is very small even at the SNR of 7 dB, where speech detection using speech level does not work at all. In the proposed N-best search method, the effectiveness of keeping candidates is improved by merging the word sequences whose meanings are identical. By reducing the number of candidates, the time for reordering the N-best candidates can be reduced to one fourth without any degradation of recognition accuracy.

Full Paper

Bibliographic reference.  Takeda, Kazuya / Kuroiwa, Shingo / Naito, Masaki / Yamamoto, Seiichi (1995): "Top-down speech detection and n-best meaning search in a voice activated telephone extension system", In EUROSPEECH-1995, 1075-1078.