INTERSPEECH 2006 - ICSLP
This paper describes two new methods, online speech detection and dual-gender speech recognition, for captioning broadcast news. The proposed online speech detection performs dual-gender phoneme recognition and detects a start-point and an end-point based on the ratio between the cumulative phoneme likelihood and the cumulative non-speech likelihood with a very small delay from the audio input. As soon as the start-point is detected, the subsequent continuous speech recognizer with paralleled gender-dependent acoustic models starts a search using gender change information from the preceding phoneme recognizer to reduce computational cost. Speech recognition experiments on conversational commentaries and field reporting from Japanese broadcast news showed that the proposed speech detection method was effective in reducing false segmentations and also recognition errors in comparison with a conventional method using adaptive energy thresholds. The proposed dual-gender speech recognition with the new speech detection significantly reduced the word error rate by 11.2% relative to a conventional gender-independent system, while keeping the computational cost in real-time.
Bibliographic reference. Imai, Toru / Sato, Shoei / Kobayashi, Akio / Onoe, Kazuo / Homma, Shinichi (2006): "Online speech detection and dual-gender speech recognition for captioning broadcast news", In INTERSPEECH-2006, paper 1103-Wed1CaP.1.